You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
monarch-kg takes about 10 hours to validate, so I'm spending some time with a profiler to see what's up.
I started with kg-phenio and found that it run in a very reasonable amount of time, and there wasn't anything that jumped out at me in the profiler as a problem.
For the last hour or so, I've been running the validator on monarch-kg, and nearly all of the compute time is being taken up by the log_error method (it was 92% last time I checked, and now is at 99.9%)
Looking at the code, my guess would be that all of the not in checks get progressively more expensive as the number of errors goes up:
def log_error(
self,
entity: str,
error_type: ErrorType,
message: str,
message_level: MessageLevel = MessageLevel.ERROR,
):
"""
Log an error to the list of such errors.
:param entity: source of parse error
:param error_type: ValidationError ErrorType,
:param message: message string describing the error
:param message_level: ValidationError MessageLevel
"""
# index errors by entity identifier
level = message_level.name
error = error_type.name
# clean up entity name string...
entity = str(entity).strip()
if level not in self.errors:
self.errors[level] = dict()
if error not in self.errors[level]:
self.errors[level][error] = dict()
# don't record duplicate instances of error type and
# messages for entity identifiers...
if message not in self.errors[level][error]:
self.errors[level][error][message] = [entity]
else:
if entity not in self.errors[level][error][message]:
self.errors[level][error][message].append(entity)
The text was updated successfully, but these errors were encountered:
if I just replace the contents of log_error with pass, monarch-kg (w/ 7m edges, 0.8m nodes) runs in 11 minutes for me. I think that means that the validation code doesn't have performance problems, only the code that's tracking validation errors.
monarch-kg takes about 10 hours to validate, so I'm spending some time with a profiler to see what's up.
I started with kg-phenio and found that it run in a very reasonable amount of time, and there wasn't anything that jumped out at me in the profiler as a problem.
For the last hour or so, I've been running the validator on monarch-kg, and nearly all of the compute time is being taken up by the log_error method (it was 92% last time I checked, and now is at 99.9%)
Looking at the code, my guess would be that all of the
not in
checks get progressively more expensive as the number of errors goes up:The text was updated successfully, but these errors were encountered: