Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Inference engine Nitro with Windows with/ without CUDA #950

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

hiro-v
Copy link
Contributor

@hiro-v hiro-v commented Dec 12, 2023

Windows has serious problem with only CPU and also it cannot use NVIDIA GPU properly

See comments for the test result

@hiro-v hiro-v added P0: critical Mission critical type: bug Something isn't working labels Dec 12, 2023
@hiro-v hiro-v added this to the Jan on Windows milestone Dec 12, 2023
@hiro-v hiro-v requested review from tikikun and hiento09 December 12, 2023 03:56
@hiro-v hiro-v self-assigned this Dec 12, 2023
@hiro-v hiro-v force-pushed the fix/windows_cuda_nitro branch from 133de60 to 22c8e18 Compare December 12, 2023 07:01
@hiro-v hiro-v force-pushed the fix/windows_cuda_nitro branch from 22c8e18 to 2d78857 Compare December 12, 2023 12:47
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 12, 2023

  • CPU:
    CleanShot_2023-12-12_at_23 47 26
  • Jan app usage:
    CleanShot_2023-12-12_at_23 47 43

Machine:

  • OS: Windows 11 Home

  • No AMD/ NVIDIA GPU

  • CPU: Intel 8th i7

  • Memory: 16GB

  • Model: TinyLlama, but other model yields the same performance

  • Performance: 15 tokens/s (very reasonable)

  • It's not laggy as it uses Physical cores (not logical cores which normally on windows they supports hyper-threading which is 2 per CPU, and on Intel 12th+ it has performance/ efficiency cores which is not easy to calculate). This setup helps a lot for normal computer as compute intensive tasks (LLM model) are scheduled to run on performance cores, while other IO intensive tasks (even OS background tasks) on others
    The CPU does not likely to hit 100% and hold at that level (better from what we experienced with current stable release on windows)

However there are some Windows specific problem that we observe:
After windows sleep and wakeup, the perf decrease by apprx. 20%
Windows without power plugged in has perf decrease but I did not have chance to test.

@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 12, 2023

  • CPU:
    CleanShot 2023-12-13 at 00 28 40
  • NVIDIA GPU VRAM usage with Nitro log
    CleanShot 2023-12-13 at 00 29 42
  • GPU utilization and VRAM consumption
    CleanShot 2023-12-13 at 00 30 37
  • Jan App performance
    CleanShot 2023-12-13 at 00 29 58

Machine:

  • OS: Windows 11 Home
  • NVIDIA GPU 3090 with 24GB VRAM
  • CPU: Intel 13th i9
  • Memory: 64GB

Test:

  • Model: Tinyllama
  • Nitro version: 0.1.26
  • Performance (impressive): 57token/s (almost fully offloaded to NVIDIA GPU)

@hiro-v hiro-v requested review from a team and removed request for tikikun and hiento09 December 12, 2023 17:39
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 12, 2023

Test result on the laptop without NVIDIA, now I plug the NVIDIA GTX 1050ti 4gb RAM.
The best thing is that I close Jan app and open again, it can use NVIDIA GPU right away without any further installation, which is good UX.

Here is the test result:

  • Jan app result (22 tokens/s)
    CleanShot 2023-12-13 at 00 58 08

  • CPU usage
    CleanShot 2023-12-13 at 00 57 14

  • GPU usage
    CleanShot 2023-12-13 at 00 57 24

  • Nitro NVIDIA utilization
    CleanShot 2023-12-13 at 00 57 35

  • Model: TinyLlama

  • Perf: 22 tokens/ s

.gitignore Show resolved Hide resolved
@hiro-v hiro-v merged commit a7099a4 into main Dec 13, 2023
@hiro-v hiro-v deleted the fix/windows_cuda_nitro branch December 13, 2023 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0: critical Mission critical type: bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants