Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang transpiler integration #756

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e3c0d25
development based clang transpiler integration
ViktorYastrebov May 9, 2024
0bf0c9d
added missing GitSubmodules.cmake
ViktorYastrebov May 9, 2024
5fc800e
fixes for code review & OpenMP/Serial bug fix of non-polymorphic call…
vyast-softserveinc May 9, 2024
240fed3
refactoring of integration, use function composition & callbacks stra…
vyast-softserveinc May 10, 2024
0892ad7
make unchanged files unchanged
vyast-softserveinc May 10, 2024
8e2f4a0
fix hipDeviceProp_t type to be the same as original HIP & revert back…
vyast-softserveinc May 14, 2024
3e0f167
fix package build without occa-transpiler
vyast-softserveinc May 14, 2024
9484255
update occa-transpiler version to v1.1
vyast-softserveinc May 14, 2024
1728fe7
update occa-transpiler to latest devel(fix cuda/hip intrinsics)
vyast-softserveinc May 15, 2024
7a98cc1
update occa-transpiler taggeed version
vyast-softserveinc May 15, 2024
6186901
move to tag v1.1 occa-transpiler
vyast-softserveinc May 16, 2024
85b3c35
added example with occa-transpiler and C++ featured okl kernel
vyast-softserveinc May 20, 2024
5767389
fixes for code review, move getTranspilerVersion from options to bin/…
vyast-softserveinc Jun 7, 2024
415ef25
update INSTALL.md & README.md documentation files
vyast-softserveinc Jun 26, 2024
73f9381
update occa-transpiler repo
IuriiKobein Aug 1, 2024
02d32bf
add option to build new transpiler with local installed clang
IuriiKobein Aug 2, 2024
7f82ac6
fix example of new oklt to support serial, openmp modes; remove debug…
IuriiKobein Sep 23, 2024
388bb47
add unsigned int to OCCA builtin types
IuriiKobein Sep 25, 2024
97736d2
update README and deps
IuriiKobein Oct 4, 2024
9718257
update occa-transpiler to v1.1
IuriiKobein Oct 16, 2024
264c37e
Remove occa-tranpiler as a submodule
thilinarmtb Oct 24, 2024
125e578
Make changes to link occa-transpiler as a library
thilinarmtb Oct 24, 2024
5c96a24
Add a link to occa-transpiler README in INSTALL.md
thilinarmtb Oct 24, 2024
494fde2
Fix a few typos
thilinarmtb Oct 24, 2024
f7fcb5f
Add a link to occa-transpiler repo
thilinarmtb Oct 25, 2024
fa73a9e
Merge pull request #8 from thilinarmtb/link_clang_transpiler_as_a_lib…
IuriiKobein Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .gitmodules
Empty file.
13 changes: 12 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ option(OCCA_ENABLE_DPCPP "Build with SYCL/DPCPP if available" ON)
option(OCCA_ENABLE_TESTS "Build tests" OFF)
option(OCCA_ENABLE_EXAMPLES "Build simple examples" OFF)
option(OCCA_ENABLE_FORTRAN "Enable Fortran interface" OFF)
option(OCCA_CLANG_BASED_TRANSPILER "Build with occa-transpiler dependecy" OFF)

if(OCCA_ENABLE_FORTRAN)
enable_language(Fortran)
Expand Down Expand Up @@ -67,6 +68,11 @@ else()
set(OCCA_OS "OCCA_WINDOWS_OS")
endif()

# INFO: order is important, deps should not apply compiler flags
if (OCCA_CLANG_BASED_TRANSPILER)
find_package(oklt REQUIRED)
endif()

include(SetCompilerFlags)
include(CheckCXXCompilerFlag)

Expand Down Expand Up @@ -113,6 +119,11 @@ target_include_directories(libocca PRIVATE
$<BUILD_INTERFACE:${OCCA_SOURCE_DIR}/src>)

target_compile_definitions(libocca PRIVATE -DUSE_CMAKE)
if (OCCA_CLANG_BASED_TRANSPILER)
target_link_libraries(libocca PRIVATE occa::occa-transpiler)
target_compile_definitions(libocca PRIVATE -DBUILD_WITH_CLANG_BASED_TRANSPILER)
endif()

#=======================================

#---[ OpenMP ]--------------------------
Expand Down Expand Up @@ -231,7 +242,7 @@ if(OCCA_ENABLE_METAL AND APPLE)
endif()
endif()
#=======================================

if(NOT OCCA_IS_TOP_LEVEL)
# OCCA is being built as a subdirectory in another project
set(OCCA_OPENMP_ENABLED ${OCCA_OPENMP_ENABLED} PARENT_SCOPE)
Expand Down
26 changes: 23 additions & 3 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,19 @@

### Optional

- Fortan 90 compiler
- Fortran 90 compiler
- CUDA 9 or later
- HIP 3.5 or later
- SYCL 2020 or later
- OpenCL 2.0 or later
- OpenMP 4.0 or later
- Support Clang based transpiler

## Linux

### **Configure**

OCCA uses the [CMake] build system. For convenience, the shell script `configure-cmake.sh` has been provided to drive the Cmake build. The following table gives a list of build parameters which are set in the file. To override the default value, it is only necessary to assign the variable an alternate value at the top of the script or at the commandline.
OCCA uses the [CMake] build system. For convenience, the shell script `configure-cmake.sh` has been provided to drive the CMake build. The following table gives a list of build parameters which are set in the file. To override the default value, it is only necessary to assign the variable an alternate value at the top of the script or at the commandline.

Example
```shell
Expand All @@ -46,6 +47,7 @@ $ CC=clang CXX=clang++ OCCA_ENABLE_OPENMP="OFF" ./configure-cmake.sh
| OCCA_ENABLE_TESTS | Build OCCA's test harness | `ON` |
| OCCA_ENABLE_EXAMPLES | Build OCCA examples | `ON` |
| OCCA_ENABLE_FORTRAN | Build the Fortran language bindings | `OFF`|
| OCCA_CLANG_BASED_TRANSPILER | Build clang based transpiler that support C++ in OKL | `OFF`|
| FC | Fortran 90 compiler | `gfortran` |
| FFLAGS | Fortran compiler flags | *empty* |

Expand All @@ -67,7 +69,25 @@ After CMake configuration is complete, OCCA can be built with the command
$ cmake --build build --parallel <number-of-threads>
```

When cross compiling for a different platform, the targeted hardware doesn't need to be available; however all dependencies&mdash;e.g., headers, libraries&mdash;must be present. Commonly this is the case for large HPC systems, where code is compiled on login nodes and run on compute nodes.
When cross compiling for a different platform, the targeted hardware doesn't need to be available; however all dependencies&mdash;e.g., headers, libraries&mdash;must be present. Commonly this is the case for large HPC systems, where code is compiled on login nodes and run on compute nodes.


#### Building with Clang transpiler

occa-transpiler repository can be found in [libocca/occa-transpiler](https://github.com/libocca/occa-transpiler/).
Please refer [occa-transpiler README](https://github.com/libocca/occa-transpiler/blob/main/README.md) for instructions on how to
build and install the occa-transpiler.
Then you can use the following commands to install OCCA with occa-transpiler enabled.
Please replace `<occa-transpiler install dir>` by the root directory of your
occa-transpiler installation.

```shell
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DOCCA_CLANG_BASED_TRANSPILER=ON -DCMAKE_PREFIX_PATH=<occa-transpiler install dir>/lib/cmake ..
$ cmake --build . --parallel <number-of-threads>
$ cmake --install . --prefix install
```

### Testing

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,13 @@ Mission critical computational science and engineering applications from the pub

### Optional

- Fortan 90 compiler
- Fortran 90 compiler
- CUDA 9 or later
- HIP 4.2 or later
- SYCL 2020 or later
- OpenCL 2.0 or later
- OpenMP 4.0 or later
- C++ support for OKL with clang based transpiler [new-okl-transpiler](https://github.com/libocca/occa-transpiler)

## Build, Test, Install

Expand All @@ -67,7 +68,6 @@ $ cmake --install build --prefix install

If dependencies are installed in a non-standard location, set the corresponding [environment variable](INSTALL.md#dependency-paths) to this path.


## Use

### Environment
Expand Down
10 changes: 10 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
compile_cpp_example_with_modes(oklt_v3_moving_avg main.cpp)

add_custom_target(cpp_example_oklt_v3_moving_avg_cpy ALL
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/constants.h constants.h
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/movingAverage.okl movingAverage.okl)
add_dependencies(examples_cpp_oklt_v3_moving_avg cpp_example_oklt_v3_moving_avg_cpy)
target_sources(examples_cpp_oklt_v3_moving_avg
PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/movingAverage.okl
PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/constants.h
)
5 changes: 5 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/constants.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#pragma once

constexpr const int THREADS_PER_BLOCK = 1024;
//INFO: it's not possible to setup dynamicaly extern @shared array for CUDA
constexpr const int WINDOW_SIZE = 16;
93 changes: 93 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#include <iostream>
#include <occa.hpp>
#include <vector>
#include "constants.h"

std::vector<float> buildData(std::size_t size,
float initialValue,
float fluctuation)
{
std::vector<float> buffer(size);
float currentValue = initialValue;
float longIncrement = 1.0f;
float fluctuationIncrement = fluctuation;
for(std::size_t i = 0; i < buffer.size(); ++i) {
buffer[i] = currentValue;
fluctuationIncrement = -fluctuationIncrement;
if(i % WINDOW_SIZE == 0) {
longIncrement = -longIncrement;
}
currentValue += longIncrement + fluctuationIncrement;
}
return buffer;
}

std::vector<float> goldMovingAverage(const std::vector<float> &hostVector) {
std::vector<float> result(hostVector.size() - WINDOW_SIZE);
for(std::size_t i = 0; i < result.size(); ++i) {
float value = 0.0f;
for(std::size_t j = 0; j < WINDOW_SIZE; ++j) {
value += hostVector[i + j];
}
result[i] = value / WINDOW_SIZE;
}
return result;
}

bool starts_with(const std::string &str, const std::string &substring) {
return str.rfind(substring, 0) == 0;
}

occa::json getDeviceOptions(int argc, const char **argv) {
vyast-softserveinc marked this conversation as resolved.
Show resolved Hide resolved
for(int i = 0; i < argc; ++i) {
std::string argument(argv[i]);
if((starts_with(argument,"-d") || starts_with(argument, "--device")) && i + 1 < argc)
{
std::string value(argv[i + 1]);
return occa::json::parse(value);
}
}
return occa::json::parse("{mode: 'Serial'}");
}

int main(int argc, const char **argv) {

occa::json deviceOpts = getDeviceOptions(argc, argv);
auto inputHostBuffer = buildData(THREADS_PER_BLOCK * WINDOW_SIZE + WINDOW_SIZE, 10.0f, 4.0f);
std::vector<float> outputHostBuffer(inputHostBuffer.size() - WINDOW_SIZE);

occa::device device(deviceOpts);
occa::memory deviceInput = device.malloc<float>(inputHostBuffer.size());
occa::memory deviceOutput = device.malloc<float>(outputHostBuffer.size());

occa::json buildProps({
{"transpiler-version", 3}
});

occa::kernel movingAverageKernel = device.buildKernel("movingAverage.okl", "movingAverage32f", buildProps);

deviceInput.copyFrom(inputHostBuffer.data(), inputHostBuffer.size());

movingAverageKernel(deviceInput,
static_cast<int>(inputHostBuffer.size()),
deviceOutput,
static_cast<int>(deviceOutput.size()));

// Copy result to the host
deviceOutput.copyTo(&outputHostBuffer[0], outputHostBuffer.size());

auto goldValue = goldMovingAverage(inputHostBuffer);

constexpr const float EPSILON = 0.001f;
for(std::size_t i = 0; i < outputHostBuffer.size(); ++i) {
bool isValid = std::abs(goldValue[i] - outputHostBuffer[i]) < EPSILON;
if(!isValid) {
std::cout << "Comparison with gold values has failed" << std::endl;
return 1;
}
}
std::cout << "Comparison with gold has passed" << std::endl;
std::cout << "Moving average finished" << std::endl;

return 0;
}
85 changes: 85 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/movingAverage.okl
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#include "constants.h"

template<class T,
int THREADS,
int WINDOW>
struct MovingAverage {
MovingAverage(int inputSize,
int outputSize,
T *shared_input,
T *shared_output)
:_inputSize(inputSize)
,_outputSize(outputSize)
,_shared_data(shared_input)
,_result_data(shared_output)
{}

void syncCopyFrom(const T *input, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
//INFO: copy base chunk
if(linearIdx < _inputSize) {
_shared_data[thread_idx] = input[linearIdx];
}
//INFO: copy WINDOW chunk
int tailIdx = (block_idx + 1) * THREADS + thread_idx;
if(tailIdx < _inputSize && thread_idx < WINDOW) {
_shared_data[THREADS + thread_idx] = input[tailIdx];
}
@barrier;
}

void process(int thread_idx) {
T sum = T();
for(int i = 0; i < WINDOW; ++i) {
sum += _shared_data[thread_idx + i];
}
_result_data[thread_idx] = sum / WINDOW;
@barrier;
}

void syncCopyTo(T *output, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
if(linearIdx < _outputSize) {
output[linearIdx] = _result_data[thread_idx];
}
@barrier;
}
private:
int _inputSize;
int _outputSize;

//INFO: not supported
// @shared T _data[THREADS_PER_BLOCK + WINDOW_SIZE];
// @shared T _result[THREADS_PER_BLOCK];

T *_shared_data;
T *_result_data;
};

@kernel void movingAverage32f(@restrict const float *inputData,
int inputSize,
@restrict float *outputData,
int outputSize)
{
@outer(0) for (int block_idx = 0; block_idx < outputSize / THREADS_PER_BLOCK + 1; ++block_idx) {
@shared float blockInput[THREADS_PER_BLOCK + WINDOW_SIZE];
@shared float blockResult[THREADS_PER_BLOCK];
MovingAverage<float, THREADS_PER_BLOCK, WINDOW_SIZE> ma{
inputSize,
outputSize,
blockInput,
blockResult
};
@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.syncCopyFrom(inputData, block_idx, thread_idx);
}

@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.process(thread_idx);
}

@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.syncCopyTo(outputData, block_idx, thread_idx);
}
}
}
4 changes: 4 additions & 0 deletions examples/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ add_subdirectory(19_stream_tags)
add_subdirectory(20_native_dpcpp_kernel)
add_subdirectory(30_device_function)


if (OCCA_CLANG_BASED_TRANSPILER)
add_subdirectory(31_oklt_v3_moving_avg)
endif()
# Don't force-compile OpenGL examples
# add_subdirectory(16_finite_difference)
# add_subdirectory(17_mandelbulb)
2 changes: 2 additions & 0 deletions include/occa/dtype/builtins.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ namespace occa {
extern const dtype_t char_;
extern const dtype_t short_;
extern const dtype_t int_;
extern const dtype_t uint_;
extern const dtype_t long_;
extern const dtype_t ulong_;
extern const dtype_t float_;
extern const dtype_t double_;

Expand Down
4 changes: 3 additions & 1 deletion src/dtype/builtins.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ namespace occa {
const dtype_t char_("char", sizeof(char), true);
const dtype_t short_("short", sizeof(short), true);
const dtype_t int_("int", sizeof(int), true);
const dtype_t uint_("unsigned int", sizeof(unsigned int), true);
const dtype_t long_("long", sizeof(long), true);
const dtype_t ulong_("unsigned long", sizeof(unsigned long), true);
const dtype_t float_("float", sizeof(float), true);
const dtype_t double_("double", sizeof(double), true);

Expand Down Expand Up @@ -111,7 +113,7 @@ namespace occa {
}

template <> dtype_t get<unsigned long>() {
return long_;
return ulong_;
}

template <> dtype_t get<long long>() {
Expand Down
2 changes: 2 additions & 0 deletions src/dtype/dtype.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,8 @@ namespace occa {
dtypeMap["long"] = &dtype::long_;
dtypeMap["float"] = &dtype::float_;
dtypeMap["double"] = &dtype::double_;
dtypeMap["unsigned long"] = &dtype::ulong_;
dtypeMap["unsigned int"] = &dtype::uint_;

// Sized primitives
dtypeMap["int8"] = dtype::get<int8_t>().ref;
Expand Down
Loading
Loading