Use-case quality measurement #61
Replies: 2 comments 1 reply
-
Would it be possible that we use SQL to do a preselection - so you don't need to implement all of CQL on the RDBMS side, but just make sure that the amount of data is reduced as much as possible - and then do the final selection on the "client" side of things? |
Beta Was this translation helpful? Give feedback.
-
This is not an inherent flaw in the open-source Java engine. It was intentionally designed to support architectures such as the one Evan is suggesting. Apache Spark and other JVM big data platforms are able to distribute the open-source CQL engine libraries to nodes where the data lives and run calculations in situ. In fact, an engineer at Google published an example of such a use case here for Beam: https://github.com/google/cql-on-beam Alphora also published an example for Spark a couple years ago: https://github.com/DBCG/spark-cql-fhir Other deeper integrations are also possible and are being developed commercially. |
Beta Was this translation helpful? Give feedback.
-
NCQA's primary interest is burden reduction in the quality measurement space.
We're best known for authoring the HEDIS® measures which every health plan in the United States is required to compute & report annually. This is a significant market and there is no single HEDIS® reference implementation that everyone uses. Many companies' sole business is computing & submitting quality measures on behalf of payers. At least $1bn is spent by the US health care system annually.
NCQA is currently digitizing HEDIS® into FHIR CQL measures.
CQL is translated to an intermediary logical model called ELM
There are a couple of open source CQL execution engines in the space:
With all of these engines, the flaw is that they require bringing the data to the execution.
Imagine if SQL didn't exist, and if you wanted to query a database, you had to download the entire database to a processing node to run the query.
That's the state of all current CQL engines.
We can pump around 1m patients through our engine in about an hour. The majority of the time is spent downloading the data, writing results to disk cache, and uploading the results back to the store.
Bringing the execution to the data is a much smarter strategy for population health. SQL is the best tool for that job.
Here's a very simple CQL library:
Here's what the (very abbreviated) ELM looks like:
Now it should be fairly clear how one could translate ELM into a SQL query. By visiting the Query expression we can take its pieces (e.g. the
where
property) and translate them into corresponding SQL queries or meta queries as we discussed.Taking the example Parquet files Nikolai gave us, I loaded them into a DuckDB instance. Since I am a C# guy I used this NuGet package to use ADO.NET to interact with an in-memory DB.
Ironically, Struct types are not supported so if I want to use C# I need to unroll our structures using views. This doesn't affect this simple example but would if we were doing e.g. Observations which have STRUCT columns for codeable concepts.
The Parquet files treat patient.birthDate as a VARCHAR, so I created a simple view to cast it as a Date:
Then using an ELM visitor pattern I implemented a basic query translator which takes the above ELM and creates:
But this is too simple.
Real CQL libraries are not as simple as above. CQL has dozens of keywords that will almost certainly not map 1:1 to SQL functions. Even in the above example, if I change the CQL to this:
This would not work in DuckDB at least. One could say that anyone whose birthdate is after 1980 would be anyone whose birthdate is 1981-01-01 or later. Certainly the query translator can do this, but it wouldn't be correct in all cases. For example, in CQL:
Will evaluate to
null
because the answer is uncertain. 1980 is not 1980. It's actually [1980-01-01, 1980-12-31]. Comparing an interval to a value inside that interval using greater than is undefined.Therefore I think what is needed is to be able to create scalar & table-valued functions that implement these rules.
This is a simple example but there are many more. When we started converting CQL to C# we started by trying to use .NET native syntax (like greater than), but in the end we just turned everything into a function. Every single operator in CQL had some subtlety that made it not work with standard C# syntax - mostly because in CQL everything can be null, and C# doesn't allow you to compare nullable values using standard operators and instead requires that you coalesce the values first.
I think we would need to write functions in "meta SQL" so we can translate them for various RDBMS platforms. I am not concerned
with 100% coverage out of the box, but they should be at least translatable to ANSI SQL.
If we use a lot of functions in queries they will be slower, but they would have to be so much dramatically slower as to overcome the I/O cost of pulling all the data out of the tables for an off-platform computation engine as all listed above to make it worse.
If we achieved this, we would enable all CQL digital measures to execute against any schema-compliant platform (like Aidbox), and also any platform whose schema can be mapped to our schema using any mechanism.
Virtually every payer in the world runs on an RDBMS. So this would be huge for them.
Beta Was this translation helpful? Give feedback.
All reactions