Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show output of text analysis in UI #152

Open
blcham opened this issue Feb 22, 2023 · 12 comments
Open

Show output of text analysis in UI #152

blcham opened this issue Feb 22, 2023 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@blcham
Copy link
Contributor

blcham commented Feb 22, 2023

We have component-failure analysis done:
https://docs.google.com/spreadsheets/d/1qiGzJrXaG4regSN25Y7m3GWMjNsbRrpc/edit#gid=932624392

We have some scenario how to use it described here:
https://docs.google.com/document/d/1vR8bsdA5uv32FW5D8WcNp08bMD9yjTmEeZUBvNJbDvI/edit#

The goal of this task is:

  1. design what should be shown to user in 3 development steps
    a) adjustment with minimal effort
    b) more useful adjustment
    c) advanced adjustment
  2. design model at least for 1a)
  3. implement pipeline in SPipes to create the model
  4. implement planning backend to provide the statistics
  5. implement plan-manager UI to show the statistics
@blcham blcham added the enhancement New feature or request label Feb 22, 2023
@blcham blcham added this to the 2nd Usable Plan Manager milestone Feb 22, 2023
@blcham
Copy link
Contributor Author

blcham commented Feb 22, 2023

@kostobog
Copy link
Collaborator

A) Basic use of text analysis results on already processed MWOs

Show results of text analysis without time estimates and component failure count

  • create an RDF data set containing text analysis results, e.g. columns WorkOrderId, TaskCardId, ComponentLabel, FailureLabel, ComponentScore, FailureScore, OriginalText, AggregateScore from sheet Import Data found in Computation and results_test
  • front end should show the component and failure, the aggregate score and original text
  • back end should provide data from the data set to the front end.

B) Advanced application of text analysis results on already processed MWOs

Create a RDF data set where MWOs with a similarity metric indicating how to MWOs are referencing similar maintenance work. Using this data set a work time estimate can be calculated for a selected MWO given that it is annotated with a component and failure. The accuracy of the estimate is dependent on the annotations quality, the similarity measure and the formula used to calculate the estimate. For the purposes of the basic scenario it is sufficient to create basic implementation of these features:
Limitations

  • The implementation should not be automated and it should work for a static dataset
  • this implies that it won't be applicable for MWOs who's texts are not annotated.

Solution
The solution should contain:

  • Text Analysis dataset - contains from data requirements in RDF format, contains a similarity measure between MWOs.

  • for a selected MWO front end should show the annotated pair of component and failure, number of (previous?) pair occurrences, time estimate for similar MWOs

  • backend service to support requirements of front end

Creating the text analysis dataset requires implementation of:

  • MWO similarity measure - here are some suggestions

    • if two MWOs have the same pair of component and failure the similarity measure is 1 (identical), else 0 (not similar).
    • if two MWOs have the same pair of component and failure the similarity measure is equal to AggregateScore1 * AggregateScore2.
  • time estimate for MWOs - based on the duration of each MWO and the similarity measure between - here are some suggestions:

    • group MWOs based on component and failure. Create a resource representing the group and associate MWOs that belong to the group with the group resource. For each group, calculate the work time estimate from work session data. Store the result as a property of the group resource.
  • data requirements

    • columns WorkOrderId, TaskCardId, ComponentLabel, FailureLabel, ComponentScore, FailureScore, OriginalText, AggregateScore from sheet Import Data found in Computation and results_test
    • work session cm:id of task types, cm:start-date, cm:start-time, cm:end-date, cm:end-time of work sessions, cm:references-task.

C) More useful application of text analysis

In this scenario text analysis data should updated periodically (e.g. each time updater is executed) or and on user demand (e.g. when user selects a MWO that was not processed by text analysis yet).
This scenario requires the automation of similarity measure and work time estimate calculations.
@blcham

@blcham
Copy link
Contributor Author

blcham commented Feb 28, 2023

@kostobog please add model for case A), @Matthew-Kulich please add here example of extended annotations in termit
(we can assume in first iteration that we have only 1:1 for component:failure)

@kostobog
Copy link
Collaborator

I propose the following draft model for case A).

  • The new part of the model is in orange. The white boxes are models that we already have. Not 100% sure about text analysis.
  • Schema IRIs - if not prefixed needs to be added to the ontology (e.g. cm:). Prefixed schema elements are already in the ontology.
    image

For case B) the model is almost identical as for A).
image

@blcham please review the model.

@blcham
Copy link
Contributor Author

blcham commented Feb 28, 2023

@kostobog LGTM (there are just minor issues that will be I believe resolved in the following example, as you could not see what is output of text analysis)
@Matthew-Kulich please try to do an example instances of those structures here, so we would be sure that we are on same page.

Some issues are:

  1. text analysis row -- since each row can contain 1:N o N:1 relations of "is-failure-of-component" we must create specific instance of tuple <component,failure> instead of using text analysis row. Moreover I suggest to link such instance to the "text analysis row". Maybe I would call it
    failure-occurence -- where iri depends_on(annotated text [1], failure, component) similar way as "failure individual". failure-occurrence should be specific type of occurrence, let's call it relation-occurrence (vyskyt-relace) instead of term-occurrence (vyskyt-termu). The schema for vyskyt-relace should be similar to vyskyt-termu.

[1] - IRI of vyskyt-relace should be created in similar manner as vyskyt-termu. I believe it's IRI depends_on(annotated text, class), now it should be that IRI depends_on(annotated text, component_class, failure_class). This way we might end-up having multiple "failure individual"-s having origin one vyskyt-relace (Little strange but i believe ok, as text analysis does not understand context around the string literal representing "WO text". This would change if we incorporate "WO action", which is however not the case now).

@Matthew-Kulich
Copy link
Collaborator

Matthew-Kulich commented Feb 28, 2023

This is what the term occurrence looks like:
(termit:má-skóre,termit:odkazuje-na-anotaci and termit:odkazuje-na-anotovaný-text are created by us and it is not in the termit model)

<http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instanceba805d17360b599fc0e726b64e58ecca>
        a                           termit:výskyt-termu ;
        termit:je-přiřazením-termu  cm:missing ;
        termit:má-cíl               [ a                   termit:cíl-výskytu ;
                                      termit:má-selektor  [ a                           termit:selektor-pozici-v-textu ;
                                                            termit:má-koncovou-pozici   "73"^^xsd:int ;
                                                            termit:má-startovní-pozici  "66"^^xsd:int
                                                          ] ;
                                      termit:má-selektor  [ a                            termit:selektor-text-quote ;
                                                            termit:má-prefix-text-quote  "pass.cabin (pass.seat): the life jackets in seat 11cba were found " ;
                                                            termit:má-přesný-text-quote  "missing" ;
                                                            termit:má-suffix-text-quote  " ."
                                                          ]
                                    ] ;
        termit:má-skóre             "1.0"^^xsd:float ;
        termit:odkazuje-na-anotaci  "<span about=\"_:f4d0-1930\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/missing\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">missing</span>" ;
        termit:odkazuje-na-anotovaný-text
                "pass.cabin (pass.seat): the <span about=\"_:f4d0-1928\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/life-vest\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">life jackets</span> in <span about=\"_:f4d0-1929\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/seat-cover\" typeof=\"ddo:výskyt-termu\" score=\"0.5\">seat</span> 11cba were found <span about=\"_:f4d0-1930\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/missing\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">missing</span> ." .

This is what the final Row looks like:

<http://onto.fel.cvut.cz/resources/c979c10bb61dc80fab774d437f04962d-extended#row-410>
        a                         csvw:Row ;
        :AggregateScore           "1.0"^^xsd:float ;
        :AnnotatedText            "pass.cabin (pass.seat): the <span about=\"_:f4d0-1928\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/life-vest\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">life jackets</span> in <span about=\"_:f4d0-1929\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/seat-cover\" typeof=\"ddo:výskyt-termu\" score=\"0.5\">seat</span> 11cba were found <span about=\"_:f4d0-1930\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/missing\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">missing</span> ." ;
        :CSAT_WO_TC               "4319224" ;
        :ComponentScore           "1.0"^^xsd:float ;
        :ComponentUri             cm:life-vest ;
        :DocumentLineNumber       "409" ;
        :FailureScore             "1.0"^^xsd:float ;
        :FailureUri               cm:missing ;
        :FinalComponentUri        cm:life-vest ;
        :FinalFailureUri          cm:missing ;
        :FoundComponentsCount     2 ;
        :FoundFailuresCount       1 ;
        :IsConfirmed              "FALSE" ;
        :MultipleComponents       "life vest, seat cover" ;
        :MultipleFailures         "missing" ;
        :No_                      "409." ;
        :OriginalText             "pass.cabin (pass.seat): the life jackets in seat 11cba were found missing ." ;
        :SelectedComponentLabels  "life vest" ;
        :SelectedComponentsCount  1 ;
        :SelectedFailureLabels    "missing" ;
        :SelectedFailuresCount    1 ;
        :TC_reference             "DAILY" ;
        :TaskCardId               "DAILY" ;
        :WO_text                  "pass.cabin (pass.seat): the <span about=\"_:f4d0-1928\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/life-vest\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">life jackets</span> in <span about=\"_:f4d0-1929\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/seat-cover\" typeof=\"ddo:výskyt-termu\" score=\"0.5\">seat</span> 11cba were found <span about=\"_:f4d0-1930\" property=\"ddo:je-výskytem-termu\" resource=\"http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/missing\" typeof=\"ddo:výskyt-termu\" score=\"1.0\">missing</span> ." ;
        :WorkOrderId              "4319224" ;
        :compUri                  <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instancefd70a3a654480d5e88a9998c175d5665> ;
        :componentScore           "1.0"^^xsd:float ;
        :failureScore             "1.0"^^xsd:float ;
        :finalComp                <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instancefd70a3a654480d5e88a9998c175d5665> , <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instance1cfcaf61949271c9378edd7e3f86fac0> ;
        :finalFailure             <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instanceba805d17360b599fc0e726b64e58ecca> ;
        :mulComponents            cm:life-vest , cm:seat-cover ;
        :mulFailures              cm:missing ;
        :multipleComponents       "life vest"@en , "seat cover"@en ;
        :multipleFailures         "missing"@en ;
        :occurrences              <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instanceba805d17360b599fc0e726b64e58ecca> , <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instance1cfcaf61949271c9378edd7e3f86fac0> , <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instancefd70a3a654480d5e88a9998c175d5665> ;
        :selectedComponents       "life vest"@en ;
        :selectedFailures         "missing"@en .

@blcham
Copy link
Contributor Author

blcham commented Feb 28, 2023

@Matthew-Kulich 👍 and how do you create IRI http://onto.fel.cvut.cz/ontologies/application/termit/pojem/výskyt-termu/instanceba805d17360b599fc0e726b64e58ecca ? concat(?instancePrefix, md5(?annotatedText)) ?

@kostobog could you add here source graphml file? :)

@Matthew-Kulich
Copy link
Collaborator

Matthew-Kulich commented Feb 28, 2023

The IRI for the occurrences is created in the SPipes module and done as you wrote.

String hash = DigestUtils.md5Hex(StringEscapeUtils.unescapeJava(e.toString()));
Resource termOccurrence = outputModel.createResource(Termit.VYSKYT_TERMU + "/instance" + hash);

@blcham

@kostobog
Copy link
Collaborator

kostobog commented Mar 2, 2023

I updated the diagram according to your notes @blcham.
I removed the row and focused on relation occurrences. Here is some basic description:

  • relation-instance - reification of the relation and its relata
  • failure-instance - as relation-instance but for a specific relational type, i.e., has-failure. Example, has failure 'crack' and wing.

image

text-analysis-application-model.zip

@kostobog
Copy link
Collaborator

Posting new version of the model:

image

text-analysis-application-model.zip

@blcham
Copy link
Contributor Author

blcham commented Mar 30, 2023

Feedback from @kostobog (partly fixed by me) :


PREFIX cm: <http://onto.fel.cvut.cz/ontologies/csat-maintenance/>
PREFIX termit: <http://onto.fel.cvut.cz/ontologies/application/termit/pojem/>
PREFIX : <http://onto.fel.cvut.cz/ontologies/csat/time-analysis-0.1/>

PREFIX vocabulary-prefix: <http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT (COUNT(?s) as ?c) {
    ?taskStepExecution a cm:task-step-execution. # ok
    ?taskStepExecution cm:described-finding ?indingIndividual. # ok

    ?findingIndividual a ?finding. # ok
    ?findingIndividual cm:has-failure-occurrence ?failureOccurrence. # ok

    ?failureOccurrence a ?failureOccurrence. # ok
    ?failureOccurrence termit:má-skóre ?score. # TODO missing score, should have the value of aggregate score

    ?failureOccurrence termit:je-přiřazením-relace ?hasFailure. # ok
    ?failureOccurrence termit:založeno-na-výskytu-termu ?vyskytuTermu1. # ok
    ?failureOccurrence termit:založeno-na-výskytu-termu ?vyskytuTermu2. # ok
    ?failureOccurrence termit:založeno-na-výskytu-termu ?vyskytuTermu3aVic. # TODO failureOccurrence should contain only 2 vyskytuTermu a failure and component. Some failureOccurrence have more than 2 vyskytuTermu, e.g. cm:failure-occurrence--85cad73a95988d165d4ed5d4805a17c3 has 4.

    ?hasFailure a ?relationInstanceType. # missing type or super class, not sure what the type should be.
    ?hasFailure <http://onto.fel.cvut.cz/ontologies/csat/enhance-wo-text-0.1/has-relation> cm:has-failure. # missing relation
    ?hasFailure <http://onto.fel.cvut.cz/ontologies/csat/enhance-wo-text-0.1/has-argument1> ?vyskytuTermu1. # object is ?vyskytuTermu. has-argument1 sohuld point to component type, e.g. component iri such as vocabulary-vocabulary-prefix:winglet-skin-panel. See correct triple below.
    ?hasFailure <http://onto.fel.cvut.cz/ontologies/csat/enhance-wo-text-0.1/has-argument1> ?component. # an example of ?component is vocabulary-prefix:winglet-skin-panel
    ?hasFailure <http://onto.fel.cvut.cz/ontologies/csat/enhance-wo-text-0.1/has-argument2> ?vyskytuTermu2. # wrong object is ?vyskytuTermu. has-argument2 sohuld point to failure type, e.g. failure iri such as vocabulary-prefix:incorrect-installation. See correct triple below.
    ?hasFailure <http://onto.fel.cvut.cz/ontologies/csat/enhance-wo-text-0.1/has-argument2> ?failure . # an example of failure is vocabulary-prefix:incorrect-installation
}

@blcham
Copy link
Contributor Author

blcham commented Jul 31, 2023

This ticket should be closed after create-ing some documentation how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants