Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify finding the hip package #88

Open
wants to merge 67 commits into
base: amd-master
Choose a base branch
from

Conversation

trixirt
Copy link

@trixirt trixirt commented Jul 30, 2023

On Fedora, where hip is installed as an rpm, its cmake files can not be found and are reported as an error.

CMake Error at test/CMakeLists.txt:32 (find_package):
No "FindHIP.cmake" found in CMAKE_MODULE_PATH.

This change treats hip as a the normal package.

Evgeny and others added 30 commits September 6, 2020 01:07
Change-Id: I1baab986d36207b87f6f9ad5e0a45a9cffbea0c8
Change-Id: I50473f8008672772dd4aaf37cbc64472cb50b4a3
Change-Id: I33d01cde41d9a762a8a955a1faccfdef02d8c0ac
Change-Id: Ib8a0df7b7bb0da2e68b5b4d99ce8025de169f317
(cherry picked from commit 29da9a7)
Change-Id: Ib49c8ee034fb7481b21f950490e10b350f2a1b79
(cherry picked from commit 6567c48)
Change-Id: I9688236b06dd167960662b8eecf1a07c93b43fff
(cherry picked from commit c9ed0f0)
Change-Id: Id9203aec7800024bd749059a415fb29b8051005a
Change-Id: Ifca31d632726ab83f4c672b46cd9b97f817e757d
Change-Id: I2b00e5d310e6349fc52d5df60aae85f4c06adebe
Change-Id: Ib739cbb7538473afc9744e12d2bd568635e78616
(cherry picked from commit 1d975e5)
Change-Id: I7081b6ad21b038040267067bd73d8a44df46e4ff
… instrumentation;

Change-Id: Ibbc411541f5610ce739f3fc1efa1ab7f605220f5

initial commmit

Change-Id: I34b360be62c2083819dc5c3acc8268bd69f2f58a
Change-Id: I0bc4ca977ce44f864178e78ec339888f86cbed8a
- All libs will have RUNPATH
- libtracer_tool.so is added with RUNPATH based on ROCM_RPATH when
  defined else not set.

Signed-off-by: Pruthvi Madugundu <[email protected]>
Change-Id: I6515e603c82e1360e03eca2967f6a85e5faadc9a
Still needs valid email ID in the form of [email protected].
SWDEV-257322

Names complete as built (internal) :
roctracer-dev_1.0.0.40000-crdnnv.444_amd64.deb
roctracer-dev-1.0.0.40000-crdnnv.444.el7.x86_64.rpm

These changes are to satisfy:
http://confluence.amd.com/display/GPUCPT/Package+File+Naming

Change-Id: I5991326eb87d7dfa1304e3b2c5afb78f5a0c0361
Signed-off-by: Cole Nelson <[email protected]>
(cherry picked from commit 16ad4e9)
Change-Id: I99c22eec3fea6ac8820d574c44df099febdd27c4
(cherry picked from commit bb8f2f6)
Change-Id: I34957db88932e1ed725a0a0d8ca9a66fecc92e38
(cherry picked from commit 9061c4e)
Change-Id: Ibcaca6869ce96d8802c5fa8ba241f43834d6f2a7

update - codeobj event implementation

Change-Id: I4c12f26a19f2b31d9ac2211c3426a0e587a332b3

update2 - codeobj event implementation

Change-Id: Ic877549a83542ae00352503471d881e847ebac9c

test - codeobj event implementation

Change-Id: I0618d3a93de94c3d7467372ba4a3d4ea5520bfc7

URI reference test - codeobj event implementation

Change-Id: I6cf7e8a648cf012cb0708058b118a75e58f992b9

adding test/app - codeobj event implementation

Change-Id: Idf4c197c7b9116ccde5ec50ff47a26a858bfab32

uri test fix - codeobj event implementation

Change-Id: I7c385f82f516d9d8f2cd726366f00be3664006e3

uri test cleanup - codeobj event implementation

Change-Id: I542d5baf88c048c8b4717af843b803cd93e8f3bc

URI buffer fix - codeobj event implementation

Change-Id: Iac65e04c03a0939935c10f53c6b580a2e33878f5

HSA events tests trace-check disabled

Change-Id: I0f4d13aeeceb1d1a6e2191673eacbf9c7ae2ae52
…DCMAKE_DEBUG_TRACE=1'

Change-Id: Id16c01a6c00f6384c37fa9b5a9709a5e98e1fb57
Change-Id: I5fdb25b67eaae43b3c01cd8de3824f9343c37794
Change-Id: I292255adab3a70fa00a1dd5685b788521687f35b
Change-Id: Ic430e3f959119983a65929fc70332e293cc3448d
Change-Id: I18e2cfdf2574110bffa09d30c7ac1d3941252939
Change-Id: I0fd78c01595bbd506f42cf9dfb45f62b2124f704
Change-Id: I3dda55865bafa41cc6670e414b213f13a2a2a7ac
Change-Id: I43ca5e022d2c055b6a9bc2c09b4276b490a4b986
Change-Id: Ifd5f0fbad70afa1e79da8b4b9aa639d899cbea76
Change-Id: I1bf2a6093331e7a08179b9f64394c5c49206ef0e
Change-Id: Ia582a27482581c3b81c42da0add9f6743898da6c
Change-Id: I180b18f9e1fae40c923d6210901f06cba14e8f13
Rahul Mula and others added 28 commits February 3, 2022 12:39
…ster

Change-Id: Ie87fff9e4cdfd5d061d6ca09a623cbd9ef23efc5
…ster

Change-Id: I477207eb2ae1874809e4323b58b5dcae58dc050f
Updating amd-master branch with latest updates from ROCm 5.0.0
Change-Id: I1fa96e72169fac689a3a2ed38e988d7f5d18bf04
(cherry picked from commit ebda880)
…ster

Change-Id: I8bed468b05991f5967fe6d9b9ff49ac45120f9cc
…ster

Change-Id: Ia1e6f9d888219452fefc23728f5f85b44b4209eb
…ster

Change-Id: Ibf77ad39d05e28761ced1118461e0ad3d884f9c0
…ster

Change-Id: I73714fedd17a1ffad3ce38ce5323bf725b63586a
Exchanging the git clone of the hsa-class to a local downloaded version pushed to the roctracer repo

Change-Id: Id45a38b2d355102c2e0dee1e4bfde50398369047
(cherry picked from commit 7ee4f87)
…code

Backward comaptibility for components that search for  contents in roctracer.h
Improvements: Removed redundant code for setting and unsetting variables
Added header template file in source code instead of generating it on build time

Change-Id: I96aeb7f2a6d53d45eb5aeb5300024cd22dad1324
(cherry picked from commit 8ca752c)
Change-Id: Ic81c0c20ab295e4120f6fc3b4f055559ebf1a8c5
Default to the HSA runtime's hsa_system_get_info if the saved HSA
functions table is not yet initialized.

Change-Id: I3659095a5ad662f7ca8b0d92bd035901c6d66bb0
(cherry picked from commit 87ffbd2)
commit 8a575d8
Author: Laurent Morichetti <[email protected]>
Date:   Fri Sep 30 13:24:23 2022 -0700

    Remove the thread local begin_timestamp stack

    Using a thread_local object is problematic as the thread local
    destructors are called first before any global destructor, making
    the object invalid while tearing down the process.

    rocblas uses a global destructor to clean up the loaded HIP modules
    and ends up calling hip_executable_destroy after the timestamp stack
    is destructed. As a result the begin timestamp for that API function
    is 0.

    The solution is to store the phase_enter timestamp in the phase_data.

    Change-Id: If143f4d123dfb111c72fb20365431d07e73fc570

commit 6416434
Author: Laurent Morichetti <[email protected]>
Date:   Fri Sep 30 11:02:27 2022 -0700

    Fix a profiling data corrupted error

    Using rocprof with ROCP_MCOPY_DATA=1 while tracing HSA produces the
    following error:

    tblextr.py: Memcpy args "(0x7feb16a00000, 123handle=28593376125, 0x7feb12a00010, 123handle=27558560125, 4194304, 0, 0, 123handle=140661639440000125) = 1" cannot be identified
    Profiling data corrupted: ' ./out/rpl_data_220930_143009_1826700/input_results_220930_143009/results.txt'

    There are two issues:

    1) The hsa_agent_t handle argument is misprinted: "123handle=...125"
      Instead of printing '{' and '}', it prints '123' and '125'. The wrong
      operator<<(unsigned char) is used and an integer value is printed
      instead of a char.

      Use std::operator<< instead of hsa_support::detail::operator<< to
      print '{' and '}'

    2) The result value is unitialized and in some cases printed as a
      negative integer value. The leading '-' is not matched by the
      mem_manager regular expresion for HSA api calls.

      Correctly capture the HSA function's return value.

    Change-Id: If13a1e62eeb4e598447c4b90d53d1b2e3b408696

commit 329c046
Author: Laurent Morichetti <[email protected]>
Date:   Wed Sep 28 15:41:05 2022 -0700

    Fix an issue with aync copy timestamps

    The timestamps coming from the HIP runtime for asynchronus memory
    copies are corrupted (begin > end) because the HSA setting to record
    timestamps is turned off by the tracer's HSA intercept.

    The solution is to intercept hsa_amd_profiling_async_copy_enable and
    remember the application/runtime's request so that it can be ORed with
    IsEnabled(ACTIVITY_DOMAIN_HSA_OPS, HSA_OP_ID_COPY).

    Change-Id: Ib687cbf36711563e86c2bb8bc934c7c51572bfde

commit b664937
Author: Laurent Morichetti <[email protected]>
Date:   Mon Sep 26 09:35:03 2022 -0700

    Use the "safe" Stack for begin_timestamp

    The tracer tool needs to remember the begin timestamps for API
    callbacks, and uses a thread_local std::stack for that purpose.

    The issue with thread_local objects is that they are destructed
    before anything else when the main thread exits. To work around
    that issue, we use a "safe" stack in the roctracer API.

    Use the same "safe" stack in the tracer tool.

    Change-Id: I0d69d4eb44f0205f4102d0d5ef9803a1ec1800a5

commit a287f20
Author: Laurent Morichetti <[email protected]>
Date:   Mon Sep 26 09:27:07 2022 -0700

    Fix a typo in HipLoader

    rocprof errors out with the following message:
    symbol lookup 'KernelNameRef' failed: libamdhip64.so.5: undefined \
      symbol: KernelNameRef

    The HipLoader is incorrectly looking for a KernelNameRef symbol
    instead of hipKernelNameRef.

    Fixed the typo: KernelNameRef -> hipKernelNameRef.

    Change-Id: Ia4860e1669707b0c83d67e71b78d362b07a6aaa7

commit bb98bc7
Author: Laurent Morichetti <[email protected]>
Date:   Mon Sep 12 13:03:48 2022 -0700

    Clean up logger.h

    Change-Id: Ibcb58d2236b012d00c3fc421a425c03093de5d50

commit 67ce5fa
Author: Laurent Morichetti <[email protected]>
Date:   Thu Sep 15 10:33:38 2022 -0700

    Fix an array subscript out-of-bounds error

    Starting with gcc-11 (verified with gcc-12 as well), an array
    out-of-bounds subscript error is reported for accessing the registration
    table element at the operation ID index. Validating the index in the
    function calling Register/Unregister does not quiet the warning/error
    in release builds, so, for gcc-11 and gcc-12, we disable that warning
    just for the RegistrationTable class.

    Change-Id: I6bc4a02aa072cfa8905ecde5e3960aebf32fc912

commit 05ee3ff
Author: Laurent Morichetti <[email protected]>
Date:   Thu Sep 8 22:08:08 2022 -0700

    Cleanup the include files

    Use #include "header" instead of #include <header> so that the header
    files are found when the application #includes <roctracer/roctracer.h>
    with -I /opt/rocm/include.

    Change-Id: I24feac9a5030d3600aee98084340e246c3990db5

commit 4856d33
Author: Laurent Morichetti <[email protected]>
Date:   Fri Sep 9 10:04:16 2022 -0700

    SWDEV-355896 - Fix a data corruption error in post processing

    The post-processing script cannot handle HIP ops without a correlation
    ID. The correlation ID is needed to connect the record to a HIP stream
    and originating thread.

    This issue was exposed by a change to the tracer API to report
    asynchronous activities even if their originating synchronous API
    activity (callback) is not enabled. This was a flow in the API.

    Also fix an issue with the API filtering. Undefined API names should
    not cause an exception, they should be ignored.

    Change-Id: Iab2221af6180ade2b9c2eb10c256c3a73d872e9f

commit 900d5e0
Author: Laurent Morichetti <[email protected]>
Date:   Thu Sep 8 21:04:41 2022 -0700

    Fix the symbol name for deprecated functions

    Change-Id: I53c0af1d1f6a3998992bdaa737e9b10829e5abc3

commit 87ffbd2
Author: Laurent Morichetti <[email protected]>
Date:   Thu Sep 8 18:30:54 2022 -0700

    Fix hsa_support::timestamp_ns if HSA is not yet initialized

    Default to the HSA runtime's hsa_system_get_info if the saved HSA
    functions table is not yet initialized.

    Change-Id: I3659095a5ad662f7ca8b0d92bd035901c6d66bb0

commit db69cc1
Author: Laurent Morichetti <[email protected]>
Date:   Thu Sep 8 00:31:03 2022 -0700

    Fix the Loader

    Instead of dlopen'ing RTLD_NOLOAD a library (for example libamdhip64.so)
    and rely on the dynamic linker search path, search through the already
    loaded shared objects for a library with a matching name.

    Change-Id: I3e74d432bd7ca68df8927ca435b290e86aaaf9e9

commit ab3f361
Author: Laurent Morichetti <[email protected]>
Date:   Wed Sep 7 21:12:33 2022 -0700

    SWDEV-351980 - Remove the ROCtracer private interface from the public header

    Change-Id: Ib3183e87d0c2bd1679926a4da9bbb6e46d70fb9f

commit 2673bf5
Author: Laurent Morichetti <[email protected]>
Date:   Fri Sep 2 12:40:15 2022 -0700

    SWDEV-351980 - Consolidate registration tables in the roctracer

    Change-Id: I44cd1cc81cf6a529aed89ee8db1377c0aa67f0dc

commit 57867e4
Author: Laurent Morichetti <[email protected]>
Date:   Thu Aug 18 14:50:51 2022 -0700

    Use fatal() and warning() for logging errors

    Change-Id: I4d525ed2a7dba72beff6fbe43383015e55465fcd

commit 9d69e7d
Author: Laurent Morichetti <[email protected]>
Date:   Tue Aug 2 09:18:27 2022 -0700

    Remove tracker.h

    Change-Id: I74860431c5f4c4954ddb79fb7e2a613fecc8793b

commit 61c232b
Author: Laurent Morichetti <[email protected]>
Date:   Mon Jul 11 08:18:26 2022 -0700

    Fix nested timestamps

    Change-Id: I6385d52cc858670a116f5c2eb65e4f19be73190f

commit 9c57b15
Author: Laurent Morichetti <[email protected]>
Date:   Thu Jul 7 13:18:38 2022 -0700

    Remove the ROCprofiler loader

    Was used for the HSA_EVT activities, so no longer needed.

    Change-Id: I7729fb4519f2e3cee73776264647381cb5826067

commit c2b87b1
Author: Laurent Morichetti <[email protected]>
Date:   Fri Jun 10 18:07:30 2022 -0700

    Bring the HSA_EVT callbacks back to the roctracer

    Change-Id: I26080b264d7989880ba7e9f00502cc680b2256d7

commit ac3214d
Author: Laurent Morichetti <[email protected]>
Date:   Thu Aug 18 20:55:54 2022 -0700

    Use a global correlation_id for all records

    Change-Id: I87fe16fefb52a95242bc64b7007b71c9d8978d44

commit 340c7cb
Author: Laurent Morichetti <[email protected]>
Date:   Tue Aug 30 18:47:00 2022 -0700

    SWDEV-351980 - Use the new hipRegister/RemoveAsyncActivityCallback

    Remove the hipInitActivityCallback and use the new hipRegister/
    RemoveActivityCallback which allows distinct memory pools to be used
    for HIP_OPS activities.

    Enable the multi_pool_activities test.

    Change-Id: I6f6feaedecc9c36285bea975caf24dbf8f5f624b

commit f0e082f
Author: Laurent Morichetti <[email protected]>
Date:   Tue Aug 16 20:03:10 2022 -0700

    SWDEV-351980 - Remove HipApi{Callback|Activity}{Enable|Disable}Check

    The code is easier to read if calling HIPActivityCallbackTracker
    enable/disable_check directly. Both enable/disable_check return the
    new mask, and the check whether a callback is already installed is
    clearer.

    Change-Id: Ic90d34489b5b4d9929dc08b4d9e93cc974b136b1

commit 88c6e0a
Author: Laurent Morichetti <[email protected]>
Date:   Thu Aug 4 11:38:08 2022 -0700

    SWDEV-351980 - Don't allocate hip_api_data and record

    The HIP runtime is now allocating the hip_api_data and record on its
    stack so we don't need the thread local record_data_pair stack anymore.

    Refactor the API callback function to handle both the case where
    synchronous user callbacks are requested and the case where asynchronous
    records are requested (enable_callback & enable_activity respectively).
    If the callback argument (memory pool) is not null, then activity
    records are requested.

    Remove CorrelationIdRegister and CorrelationIdLookup. These were used
    by the HIP runtime to associate a HIP record id to a ROCtracer
    correlation id. Instead, the HIP runtime is now using the correlation
    ID returned in the hip_api_data_t.

    Added a test to check enabling/disabling concurrent callbacks and
    activities.

    Change-Id: I5850cfead9861eb3602a3e8fcb7b22580d5fc979

commit ad01ba5
Author: Laurent Morichetti <[email protected]>
Date:   Tue Sep 6 18:57:20 2022 -0700

    Deprecate enable/disable_callback/activity[_expl]

    These functions have little value as it is very unlikely an application
    would want to enable all the domains.

    Change-Id: I4743e8ddf6743e60c95c7ba5240950d2ef734301

commit cfdfa2a
Author: Laurent Morichetti <[email protected]>
Date:   Fri Aug 26 11:19:07 2022 -0700

    Add multi_pool_activities test

    This test checks that asynchronous activities can be enabled in distinct
    memory pools. It enables activity reporting for HIP kernel dispatches in
    one memory pool, and memory copy reporting in another memory pool.

    The output of this test to stdout should be a series of kernel dispatch
    records (10) followed by a series of memory copy records (10). The
    records should not be interleaved.

    Change-Id: Idb5cca7e650b2312a1955909932364f914737856

commit 006ce7b
Author: Laurent Morichetti <[email protected]>
Date:   Mon Aug 22 21:20:04 2022 -0700

    Remove global variables from the file plugin

    The plugin's file scope global variables destructors could be called
    before roctracer_plugin_finalize is called, making the global variables
    undefined by the time roctracer_plugin_finalize is called.

    To avoid this issue, remove all non-pod global variables from the file
    plugin.

    Change-Id: I4b620d67d460d9c99adfd81cbf46b0e64540c503

commit bddb985
Author: Laurent Morichetti <[email protected]>
Date:   Fri Aug 19 10:31:16 2022 -0700

    Remove roctracer_mark

    This function has been deprecated since ROCm-2.9, use ROCTX's
    roctxMark(const char* message) as a replacement for roctracer_mark.

    Change-Id: Ie4aeae1db238453fc4451746cc9a338032ba817f

commit 4cd7497
Author: Ammar ELWazir <[email protected]>
Date:   Thu Aug 18 21:46:58 2022 -0500

    Fixing issues caused by the plugin patches

    - Multithreaded Applications and plugin destruction
    - Fixing Async-copy trace in file plugin
    - Adding the assert checkups for every trace buffer flush function

    Change-Id: I96e096fd7ee2604931200a0b446edb5ce49959dd

commit 753d543
Author: Laurent Morichetti <[email protected]>
Date:   Wed Aug 17 23:19:49 2022 -0700

    Use std::dec to print the begin_timestamp

    Change-Id: I88377b840b2e2cce278575bc398cbdc296e6dfd7

commit 80d363a
Author: Laurent Morichetti <[email protected]>
Date:   Wed Aug 17 14:25:41 2022 -0700

    New util library

    - Add string_printf/string_vprintf.
    - Add warning and error with backtrace support.

    Change-Id: I3dd73b4caed0d767bd9e39ffef15ff8484d0b0bf

commit 993dcf9
Author: Laurent Morichetti <[email protected]>
Date:   Wed Jul 13 14:02:38 2022 -0700

    Fix tput

    Don't set the color variables if tput is not available, not working, or
    if ncolors < 8.

    Move the color variables outside of eval to avoid calling tput over and
    over again.

    Change-Id: Id51a742b77ad0f7c99c1c7c5d05bed0f423b75de

commit b7e1f74
Author: Ammar ELWazir <[email protected]>
Date:   Thu Jun 23 01:50:07 2022 -0500

    Adding File Plugin

    - Added File plugin as the default plugin
    - Moved the flush functions to the plugins
    - Improved the flush to file implementation

    Change-Id: I80dd448eb8147a8ea4aa63b39bd1d0a4baf7252b

commit 1c7c5cc
Author: Ammar ELWazir <[email protected]>
Date:   Mon Aug 8 19:20:32 2022 -0500

    Adding Plugin Interface

    - Add roctracer plugins hooks
    - Add Roctracer plugin environment variable
    - Add the plugin class
    - Add the plugin implementation

    Change-Id: I12ee2e2be035abac14864764fb76837a4533cf60

commit 591db0b
Author: Ammar ELWazir <[email protected]>
Date:   Mon Aug 8 20:45:34 2022 -0500

    Changing NULL to nullptr (Tracer Tool)

    Change-Id: I567bf7944599922e5d402e55142c2915ae24fb69

Change-Id: I24f448b3510d3fa2451103621b822421c11e5921
Strings ([const] char *, [const] char[]) passed as arguments to API
functions may not always contain printable characters. All string
arguments should be quoted and escaped in the trace logs.

Change-Id: Ie39058f2190048b1a0090df16d9ac6bc6507e28a
(cherry picked from commit b556f86)
Strings ([const] char *, [const] char[]) passed as arguments to API
functions may not always contain printable characters. All string
arguments should be quoted and escaped in the trace logs.

Change-Id: Ie39058f2190048b1a0090df16d9ac6bc6507e28a
(cherry picked from commit b556f86)
…e device's index

When multiple ranks are used, each rank's first logical device always
has GPU ID 0, regardless of which physical device is selected with
CUDA_VISIBLE_DEVICES. Because of this, when merging trace files from
multiple ranks, GPU IDs from different processes may overlap.

The long term solution is to use the KFD's gpu_id which is stable
across APIs and processes. Unfortunately the gpu_id is not yet exposed
by the ROCr, so for now use the driver's node id.

Change-Id: I2f5af8d2a7e8a89efeb5e0a1b86bdfa547b25fc8
(cherry picked from commit 799f032)
… OFF

Using wrapper header files will result in #warning message by default

Change-Id: Ib8a05d11f2391dfcdac8601da26e1096821cd555
…a corrupted

Change-Id: I3d8dbb2a40d948cd06cb1278acc50dc5be4ca0ef
(cherry picked from commit ee71368)
On Fedora, where hip is installed as an rpm, its cmake files can
not be found and are reported as an error.

CMake Error at test/CMakeLists.txt:32 (find_package):
  No "FindHIP.cmake" found in CMAKE_MODULE_PATH.

This change treats hip as a the normal package.

Signed-off-by: Tom Rix <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.