TigerShark is an X12 EDI message parser that can be tailored to a specific partner in the health care payment ecosystem.
TigerShark is used to transform X12 schema definitions into Plain old Python class definitions. The class definitions are used by an application to process an X12 data stream.
TigerShark's generated code can parse X12 messages, providing Python objects that can be used in a variety of ways. A web application can be built using any Python framework. A Juypter Lab notebook can be built to do analysis work on claim data. Python-based middleware can be written to process claim files.
The X12 message definitions from from the PyX12 project, see https://github.com/azoner/pyx12.
The X12 syntax defines a complex more-or-less hierarchical structure of files that have messages. Messages have Loops (that can repeat), Segments, and individual Data Elements.
The core idea is that a Segment is a group of related Data Elements. A segment has an identifier, making it possible to identify repeating loops and any omitted segments.
The segments are often terminated with a ~
and data elements terminated with *
or |
.
A complete schema is required, including details of fields that are optional.
We rely on https://github.com/azoner/pyx12 for the definition of the message.
(Ideally, we'd go to the source documents, including the X12 standard and Implementation Guides. We don't know where to find these.)
- Clone and check out https://github.com/azoner/pyx12.
cd tigershark3/tools
.- Put
tigershark
on thePYTHONPATH
environment variable. - Run the
xml_extract.py
application to build the message classes. All of the parameters are in the__name__ == "__main__"
block.
cd tigershar3/tools
PYTHONPATH=.. python xml_extract.py
Complete test:
tox
Run marked tests with detailed logging. Very handy for debugging.
PYTHONPATH=tigershark3 pytest -m x12parser --log-cli-level=DEBUG tests
This project uses pip-tools
to create a detailed requirements-dev.txt
from the pyproject.toml
pip-compile --extra=dev --extra=test -o requirements-dev.txt
A foundational rewrite. This uses Python type annotations to define message structure.
The tools/xml_extract.py
tool converts PyX12 XML message definitions into
pure Python classes simply and directly.
It can emit a JSON Schema description of messages, also. This permits transforming raw data to JSON for analysis and processing.
Lots of 271 bugfixes! Several tests for 271 files have been added. A few parsing bugs have been fixed.
A nice change is that the parser no longer crashes if there is an invalid code in an X12 element. This was causing me nothing but grief so I disabled it. Valid codes are only checked when there is a single valid code for a segment, since this is important in determining loop boundaries for 271 files.
ParseErrors return a more helpful error message so tracking down a bad line is much easier.
Important bugfix that prevents an early parser exit if an optional segment isn't found, but later optional segments are present.
PyX12, which this project depends on, changed its project layout (in version 2.0.0), so the parser generation scripts have been updated to look in the new directory.
I discovered a bug that caused deductible/co-insurance/co-payments from being summed if they occurred at the claims-level rather than the adjustments level. This resulted in underreporting the actual amounts. This has been fixed and unit tests have been added for this case.
I also made adjustments to the directory structure. Tests have been moved up and the latest version of PyX12 was used to generate the parsers. An old PyX12 tarball is no longer included in this distribution, so instructions were added to get PyX12 and set it up for parser generation.
Initial support for reading 270/271 files. I'm not sure when I'll add support for creating X12 files, since I have yet to need to do so. I haven't even tested creating them.
SegmentAccess, SegmentSequenceAccess, and X12SegmentBridge now all work pretty well. I didn't really like how the 835 facade was implemented so I spent more time trying to figure out nicer ways of structuring the 270/271 facades. This meant understanding and fixing the Segment[Sequence]Access classes. I was able to avoid a bunch of ugly multiple inheritance tricks and mostly freed myself from setting properties in init (though not entirely, and I'm not sure if I even want to totally remove this). I may clean up the 835 facade later, but I didn't want to introduce any breaking changes in this version.
I am understanding TigerShark more and more as I continue implementing things that S. Lott didn't get to, however this also means that I'm being bitten by the complexity of the project more often. There are a lot of good ideas in this project, and I keep encountering things I didn't expect (non-sequential hierarchical level grouping in 271 files, wtf??). TigerShark can handle most of these weird cases with minor bugfixes (which makes me more confident that this design was right from the start), but I don't think TigerShark can fully support 270/271 files due to their weird structuring. I intend to re-write a good portion of TigerShark after implementing several more formats, since I'll have a clear idea of the kind of requirements have to be met.
Nothing big, just a bugfix to ElementSequenceAccess (so it actually works) and moved two large enum types to an enums module.
Edit: Followup fix to allow unknown values in the enum x12 type, since it's possible that an insurance company returns an outdated remark code.
I realized that a single EOB file can contain multiple EOBs. This means that
the f835 facade now has a list of all of its individual EOBs as a facades
property.
I also fixed a few typos, added a ClaimAdjustments common X12LoopBridge with the corresponding claim adjustment reasons as an enum x12type, and improved the tests for 835 files.
This package is now being used in production, and the 835 facade can be considered somewhat stable.
I've added a setup.py script and organized the files a bit more. I'm considering this a major version bump because the inclusion of setup.py and pregenerated parsers makes this a lot closer to a fully usable package. I make no claim that any parser other than the 835 works as expected, since I have only dealt with 835 files so far.
Development will probably slow down now that things are mostly working. In the pipeline are auto-generated facades, or facades for 270/271 files, whichever I need to do first.
If this sort of thing interests you, the awesome biotech startup where I work is hiring. I can't say much about it other than it involves genes, real science, and we are currently saving lives and improving the future of humanity. Do drop me a line.
(Insurance billing is a painful but necessary step in this process.)
TigerShark was initially developed by S. Lott, et. al. The code was recently released at my request, after I stumbled on a few blog posts about the project:
- Python as Config Language - Forget XML and INI files (Jan 12, 2008)
- Two Python Config-File Design Patterns (Jan 19, 2008)
- Configuration File Scalability - Who Knew? (Revised) (Jan 26, 2008)
- Python as Configuration Language - More Good Ideas (March 28, 2008)
- Synchronicity and Document Object Models. (March 31, 2008)
- POPO and GOPS - Plain Old Python Objects and Good Old Python Syntax (April 1, 2008)
By the time I found those posts I had been struggling with X12 files for about two weeks, dealing with broken parsers and PDFs that cost thousands of dollars that describe the spec over 750 pages in human - but not, or only barely, machine - readable format. (How the healthcare industry gets away with getting the government to mandate a proprietary file format which you have to pay to read is the subject of another rant...).
I was struck by the amount of good, deep thought that went into the decisions S. Lott made, especially as compared to everything else I had seen. If you want to contribute to this project, I highly encourage you to go read those posts first.
What you see in version 0.1 is a series of hacks to get TigerShark working. I fixed a few bugs, added a facade for 835 files, and added setup instructions to the readme. The facade code is a mess (I didn't have enough time to fully understand the descriptor pattern and all of the underlying data structures Steven used), and I'll have to come back and make it nicer. Ultimately the facade should be able to be generated straight from the xml files which are used to build the parser. I removed a bunch of files that didn't appear to be used anywhere. I didn't try to get the demo django site working, and I'll either remove it or add instructions for it in a later version.
Many thanks to S. Lott for releasing the code and answering my questions, and to John Holland for providing the xml files in his package pyX12.
python setup.py install
The setup.py
script will install default parsers, but you might want to
generate your own, or you're fixing the generation script and need to test.
You can either convert all of the 4010 xml files in Downloads/pyx12-1.5.0.zip
or convert a file individually (which gives you more control over the result).
If you just want to generate all of the parsers, you can use the
generate_all_parsers
script:
git clone https://github.com/azoner/pyx12.git
cd pyx12
python setup.py sdist --formats=gztar,zip
cd ../
python tools/generate_all_parsers.py pyx12/dist/pyx12-*.zip -d parsers
This will generate all parsers in a directory called parsers
.
You can also just create a single parser from an unzipped pyx12 source:
git clone https://github.com/azoner/pyx12.git
cd parsers
python ../tools/convertPyX12.py 835.4010.X091.A1.xml M835_4010_X091_A1.py -b ../pyx12/pyx12/map/ -n parsed_835
This will generate a M835_4010_X091_A1.py
parser in your current directory.
from x12 import msg_835_4010_X091_A1
from x12.base import Source, X12Parser
from pathlib import Path
EXAMPLES = Path("/path") / "to" / "examples"
example = EXAMPLES / "835-example.txt"
document = Source(example.read_text())
# Skip some validation rules
errors_here = [
"*_N1:n101:Enumerated", "*_N1:n103:Enumerated", "*_REF:ref01:Enumerated",
"*_NM1:nm101:Enumerated", "*_NM1:nm102:Enumerated", "*_NM1:nm108:Enumerated"
]
parser = X12Parser(msg_835_4010_X091_A1.MSG835, skip_validation=errors_here)
msg = parser.parse(document)
Now you can access the segments of the X12 file in a purely pythonic way.
# Payee is in N1 in Loop 1000A. L1000A_N1 found in HEADER.ItemL1000A
# The ISA/GS/ST loop structure is always similar.
# Nested for statements make sense instead of always getting the first instance.
st = msg.isa_loop[0].gs_loop[0].st_loop[0]
# A good practice is the get the segment, and then process fields of the segment.
n1 = st.header[0].l1000a[0].n1
n4 = st.header[0].l1000a[0].n4
ts3 = st.detail[0].l2000[0].ts3
assert n1.n102 == "UNITED HEALTHCARE INSURANCE COMPANY"
assert n4.n403 == "553430000"
assert ts3.ts304 == 2.0
If you are kind enough to create a facade, please add unit tests. To run the tests that currently exist, run the following in the current directory.
python -m unittest discover
Note that if you first cd tests
and then run the unit tests, they will fail
because the tests expect certain files to be in certain paths.