Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable TORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS #3654

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

AmosLewis
Copy link
Collaborator

@AmosLewis AmosLewis commented Aug 20, 2024

Related discussion: https://discourse.llvm.org/t/drastically-reducing-documented-scope-of-project/80484

To fix the no module named 'torch_mlir._mlir_libs._jit_ir_importer' bug when test pytorch related models like pytorch/models/mit-b0

python ./run.py  --tolerance 0.001 0.001 --cachedir /proj/gdba/shark/cache --ireebuild ../../iree-build -f pytorch -g models --mode onnx --report --tests  pytorch/models/mit-b0 
Starting e2eshark tests. Using 4 processes
Cache Directory: /proj/gdba/shark/cache
Tolerance for comparing floating point (atol, rtol) = (0.001, 0.001)
Note: No Torch MLIR build provided using --torchmlirbuild. iree-import-onnx will be used to convert onnx to torch onnx mlir
IREE build: /proj/gdba/shark/chi/src/iree-build
Test run directory: /proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/test-run
Since --tests or --testsfile was specified, --groups ignored
Framework:pytorch mode=onnx backend=llvm-cpu runfrom=model-run runupto=inference
Test list: ['pytorch/models/mit-b0']
Test pytorch/models/mit-b0 failed [model-run]
Generated status report /proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/test-run/statusreport.md
Generated time report /proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/test-run/timereport.md
Generated summary report /proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/test-run/summaryreport.md

If I use :

pip install \                                                                     
            --find-links https://github.com/llvm/torch-mlir-release/releases/expanded_assets/dev-wheels \
            --upgrade \
            torch-mlir

pip list:

torch-mlir         20240820.189

Error in model-run.log

python runmodel.py  --torchmlirimport fximport --todtype default --mode onnx --outfileprefix mit-b0 1> model-run.log 2>&1
...
Traceback (most recent call last):
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/test-run/pytorch/models/mit-b0/runmodel.py", line 78, in <module>
    from torch_mlir import torchscript
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/e2e_venv/lib/python3.10/site-packages/torch_mlir/torchscript.py", line 25, in <module>
    from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/e2eshark/e2e_venv/lib/python3.10/site-packages/torch_mlir/jit_ir_importer/__init__.py", line 14, in <module>
    from .._mlir_libs._jit_ir_importer import *
ModuleNotFoundError: No module named 'torch_mlir._mlir_libs._jit_ir_importer'

The error disappear if I pip uninstall torch-mlir and use my local build 0820 with this patch export PYTHONPATH=${PYTHONPATH}:/proj/gdba/shark/chi/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir

@AmosLewis
Copy link
Collaborator Author

Get an error No module named 'torch_mlir_e2e_test'

  FAIL: TORCH_MLIR :: python/fx_importer/sparsity/sparse_test.py (95 of 105)
  ******************** TEST 'TORCH_MLIR :: python/fx_importer/sparsity/sparse_test.py' FAILED ********************
  Exit Code: 2
  
  Command Output (stderr):
  --
  RUN: at line 6: /opt/python/cp311-cp311/bin/python /_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py | FileCheck /_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py
  + FileCheck /_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py
  + /opt/python/cp311-cp311/bin/python /_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py
  Traceback (most recent call last):
    File "/_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py", line 19, in <module>
      from torch_mlir_e2e_test.linalg_on_tensors_backends.refbackend import (
  ModuleNotFoundError: No module named 'torch_mlir_e2e_test'
  FileCheck error: '<stdin>' is empty.
  FileCheck command line:  FileCheck /_work/torch-mlir/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py

@stellaraccident Any suggestions, should we just disable the TORCH_MLIR_ENABLE_JIT_IR_IMPORTER?

@stellaraccident
Copy link
Collaborator

Yeah, you can try that. I think we could also just move the e2e test tools out of that and into the main project. Would need to look at it some more.

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Aug 21, 2024

Yeah, you can try that. I think we could also just move the e2e test tools out of that and into the main project. Would need to look at it some more.

Run Linalg e2e integration tests
  Traceback (most recent call last):
    File "<frozen runpy>", line 198, in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/_work/torch-mlir/torch-mlir/projects/pt1/e2e_testing/main.py", line 20, in <module>
      from torch_mlir_e2e_test.configs import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line [7](https://github.com/llvm/torch-mlir/actions/runs/10482533744/job/29033805586?pr=3654#step:8:8), in <module>
      from .linalg_on_tensors_backend import LinalgOnTensorsBackendTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/linalg_on_tensors_backend.py", line [9](https://github.com/llvm/torch-mlir/actions/runs/10482533744/job/29033805586?pr=3654#step:8:10), in <module>
      from torch_mlir import torchscript
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'

@stellaraccident The tosa/stablehlo/linalg default e2e test will use the from torch_mlir import torchscript during test config, so it will lead to CI test fail. Is it ok to just delete all of these 3 default e2e tests since we already have the fx_importer/fx_importer_stablehlo/fx_importer_tosa?

@stellaraccident
Copy link
Collaborator

Yes, go ahead and delete. We can't support the old thing anymore and have replacements.

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Aug 21, 2024

Run Linalg e2e integration tests
  Traceback (most recent call last):
    File "<frozen runpy>", line 19[8](https://github.com/llvm/torch-mlir/actions/runs/10498026005/job/29082120990?pr=3654#step:8:9), in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/_work/torch-mlir/torch-mlir/projects/pt1/e2e_testing/main.py", line 20, in <module>
      from torch_mlir_e2e_test.configs import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 8, in <module>
      from .onnx_backend import OnnxBackendTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/onnx_backend.py", line 16, in <module>
      from torch_mlir_e2e_test.utils import convert_annotations_to_placeholders
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/utils.py", line 6, in <module>
      from torch_mlir.torchscript import TensorPlaceholder
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 9, in <module>
      from .torchdynamo import TorchDynamoTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/torchdynamo.py", line 25, in <module>
      from torch_mlir.torchscript import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'

Need to disable the onnx/onnx_tosa/torchdynamo e2etest as well, but this does not have replacement.

@AmosLewis AmosLewis force-pushed the torchextension branch 2 times, most recently from 3cd1ed6 to a796f37 Compare August 21, 2024 22:50
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Aug 21, 2024

main.py: error: argument -c/--config: invalid choice: 'linalg'
 (choose from 'native_torch', 'torchscript', 'lazy_tensor_core', 'onnx',
  'onnx_tosa', 'fx_importer', 'fx_importer_stablehlo', 'fx_importer_tosa')

Need to change the CI test_posix.sh
If change the linalg/tosa/stablehlo to it's fx_import version, will got 5 failed tests("IsFloatingPointFloat_True", "IsFloatingPointInt_False", "ScalarConstantTupleModule_basic", "TorchPrimLoopForLikeModule_basic", "TorchPrimLoopWhileLikeModule_basic",) in torch-stable version but torch-nightly will pass. So, I just delete them in torch-stable version in test_posix.sh

@AmosLewis
Copy link
Collaborator Author

 Check that update_abstract_interp_lib.sh has been run
  /opt/python/cp311-cp311/bin/python: Error while finding module specification for 'torch_mlir.jit_ir_importer.build_tools.abstract_interp_lib_gen' (ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer')
  Error: Process completed with exit code 1.

Error in torch-nightly, need to delete the torch_mlir.jit_ir_importer in .sh file.

@AmosLewis AmosLewis marked this pull request as ready for review August 22, 2024 00:13
@AmosLewis
Copy link
Collaborator Author

@stellaraccident need a review.

Copy link
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to come up with a replacement for update_torch_ods.sh and possibly update_abstract_interp_lib.sh before landing this. I believe that it is mostly historical that both of them rely on that one method in the JitIR extension to get the op registry, and I believe there are more direct ways to go about that these days. Been on my list for a very long time to research this... If I recall the method they rely on is just using a C++ API to get all of the schemas and then putting them together into a JSON struct for the code generators to use. There may be a comparative API on the Python side these days, or worst case, we could just parse the op definition yaml files like PyTorch itself does. Probably not a lot of work but may take some digging.

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@@ -42,8 +42,3 @@ if [ ! -z ${TORCH_MLIR_EXT_MODULES} ]; then
fi

set +u
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tempting as it is to just delete these lines and call it done, I'm not sure we can do this: people still use this tool like this.

@@ -40,8 +40,3 @@ TORCH_MLIR_EXT_MODULES="${TORCH_MLIR_EXT_MODULES:-""}"
if [ ! -z ${TORCH_MLIR_EXT_MODULES} ]; then
ext_module="${TORCH_MLIR_EXT_MODULES} "
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: We can't just make this a no-op.

@dbabokin
Copy link
Contributor

There's another weird thing about torch_mlir._mlir_libs._jit_ir_importer is that the lib is build empty, at least that's what I observe during my local build. And that does cause problems in macOS builds (not Linux!). More details are here: #3663

We should possibly remove this as part of this PR as well?

torch-mlir/setup.py

Lines 238 to 242 in 9a6fe58

EXT_MODULES.extend(
[
CMakeExtension("torch_mlir._mlir_libs._jit_ir_importer"),
]
)

@stellaraccident any opinion on that?

@stellaraccident
Copy link
Collaborator

There's another weird thing about torch_mlir._mlir_libs._jit_ir_importer is that the lib is build empty, at least that's what I observe during my local build. And that does cause problems in macOS builds (not Linux!). More details are here: #3663

We should possibly remove this as part of this PR as well?

torch-mlir/setup.py

Lines 238 to 242 in 9a6fe58

EXT_MODULES.extend(
[
CMakeExtension("torch_mlir._mlir_libs._jit_ir_importer"),
]
)

@stellaraccident any opinion on that?

Yeah. Let's get the last two code generation things here separated and then do a full excision.

@stellaraccident
Copy link
Collaborator

Run Linalg e2e integration tests
  Traceback (most recent call last):
    File "<frozen runpy>", line 19[8](https://github.com/llvm/torch-mlir/actions/runs/10498026005/job/29082120990?pr=3654#step:8:9), in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/_work/torch-mlir/torch-mlir/projects/pt1/e2e_testing/main.py", line 20, in <module>
      from torch_mlir_e2e_test.configs import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 8, in <module>
      from .onnx_backend import OnnxBackendTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/onnx_backend.py", line 16, in <module>
      from torch_mlir_e2e_test.utils import convert_annotations_to_placeholders
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/utils.py", line 6, in <module>
      from torch_mlir.torchscript import TensorPlaceholder
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 9, in <module>
      from .torchdynamo import TorchDynamoTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/torchdynamo.py", line 25, in <module>
      from torch_mlir.torchscript import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'

Need to disable the onnx/onnx_tosa/torchdynamo e2etest as well, but this does not have replacement.

Just disable/remove it. I wasn't aware a new dep like this was added, and we've been quite clear we're moving away from this. Was probably just an oversight and using the wrong thing -- the folks doing that will need to upgrade.

@AmosLewis AmosLewis force-pushed the torchextension branch 3 times, most recently from a8205c8 to b6a6958 Compare August 25, 2024 02:06
@penguin-wwy
Copy link
Collaborator

Run Linalg e2e integration tests
  Traceback (most recent call last):
    File "<frozen runpy>", line 19[8](https://github.com/llvm/torch-mlir/actions/runs/10498026005/job/29082120990?pr=3654#step:8:9), in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/_work/torch-mlir/torch-mlir/projects/pt1/e2e_testing/main.py", line 20, in <module>
      from torch_mlir_e2e_test.configs import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 8, in <module>
      from .onnx_backend import OnnxBackendTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/onnx_backend.py", line 16, in <module>
      from torch_mlir_e2e_test.utils import convert_annotations_to_placeholders
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/utils.py", line 6, in <module>
      from torch_mlir.torchscript import TensorPlaceholder
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 9, in <module>
      from .torchdynamo import TorchDynamoTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/torchdynamo.py", line 25, in <module>
      from torch_mlir.torchscript import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'

Need to disable the onnx/onnx_tosa/torchdynamo e2etest as well, but this does not have replacement.

In #3668 the symbols required by onnx e2etest have been extracted to a common interface. Now it should no longer depend on jit_ir_importer.

@stellaraccident
Copy link
Collaborator

Run Linalg e2e integration tests
  Traceback (most recent call last):
    File "<frozen runpy>", line 19[8](https://github.com/llvm/torch-mlir/actions/runs/10498026005/job/29082120990?pr=3654#step:8:9), in _run_module_as_main
    File "<frozen runpy>", line 88, in _run_code
    File "/_work/torch-mlir/torch-mlir/projects/pt1/e2e_testing/main.py", line 20, in <module>
      from torch_mlir_e2e_test.configs import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 8, in <module>
      from .onnx_backend import OnnxBackendTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/onnx_backend.py", line 16, in <module>
      from torch_mlir_e2e_test.utils import convert_annotations_to_placeholders
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/utils.py", line 6, in <module>
      from torch_mlir.torchscript import TensorPlaceholder
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/__init__.py", line 9, in <module>
      from .torchdynamo import TorchDynamoTestConfig
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/torchdynamo.py", line 25, in <module>
      from torch_mlir.torchscript import (
    File "/_work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/torchscript.py", line 25, in <module>
      from torch_mlir.jit_ir_importer import ClassAnnotator, ImportOptions, ModuleBuilder
  ModuleNotFoundError: No module named 'torch_mlir.jit_ir_importer'

Need to disable the onnx/onnx_tosa/torchdynamo e2etest as well, but this does not have replacement.

In #3668 the symbols required by onnx e2etest have been extracted to a common interface. Now it should no longer depend on jit_ir_importer.

Than you for catching/fixing that. I had missed it.

 To fix the no module named 'torch_mlir._mlir_libs._jit_ir_importer' bug.
- To fix the bug No module named 'torch_mlir_e2e_test' when test torch-stable
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Sep 6, 2024

I think we need to come up with a replacement for update_torch_ods.sh and possibly update_abstract_interp_lib.sh before landing this. I believe that it is mostly historical that both of them rely on that one method in the JitIR extension to get the op registry, and I believe there are more direct ways to go about that these days. Been on my list for a very long time to research this... If I recall the method they rely on is just using a C++ API to get all of the schemas and then putting them together into a JSON struct for the code generators to use. There may be a comparative API on the Python side these days, or worst case, we could just parse the op definition yaml files like PyTorch itself does. Probably not a lot of work but may take some digging.

@stellaraccident
Based on the code reading, for both .sh I got the call path is :

  1. update_torch_ods.sh -> torch_ods_gen.py -> registry.py Registry.load() -> get_registered_ops.cpp getRegisteredOps() -> pytorch/torch/csrc/jit/runtime/operator.cpp torch::jit::getAllOperators()

  2. update_abstract_interp_lib.sh -> abstract_interp_lib_gen.py -> library_generator.py -> registry.py Registry.load() -> get_registered_ops.cpp getRegisteredOps() -> pytorch/torch/csrc/jit/runtime/operator.cpp torch::jit::getAllOperators()

I guess the one method in the JitIR extension to get the op registry you mentioned is torch::jit::getAllOperators(). A comparative API on the Python side probably the torch._C._jit_get_all_schemas()?
With these materials, since people still use these .sh, we can rewrite the get_registered_ops.cpp with python code and call it in reigester.py. Do you think it would be good if we move the build_tools outside the jit_ir_importer/?

Another issue you didn't mentioned is in the call path:
3. update_abstract_interp_lib.sh -> abstract_interp_lib_gen.py -> library_generator.py -> module_builder.h
Are we also going to rewrite the module_builder with other comparative API on the Python side? Since we want to get rid of building all the .h/cpp files under csrc/jit_ir_importer/.

@AmosLewis
Copy link
Collaborator Author

pure python ods implementation will be in PR #3780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants