titanv CI broken #261

ghost · 2020-02-03T04:00:09Z

Stable build build #360

./headercvt stub.c -- -I /opt/clpy/llvm-7.1.0/lib/clang/7.1.0/include -I/usr/local/cuda/include

Issue #260 build #361

./headercvt stub.c -- -I /opt/clpy/llvm-7.1.0/lib/clang/7.1.0/include

The function build.get_cuda_path() may miss the CUDA path (because of deleted symlink?).

The text was updated successfully, but these errors were encountered:

vorj · 2020-02-03T04:28:06Z

The function build.get_cuda_path() may miss the CUDA path (because of deleted symlink?).

clpy/clpy_setup_build.py

Line 64 in e720a93

cuda_path = build.get_cuda_path()

vorj · 2020-02-03T04:37:40Z

It seems that @nsakabe-fixstars 's opinion is right.
cuda-10.2 had been installed in titanv , so the symlink may have gone on rebooting at last weekend (In last week, cuda-9.2 was available on titanv).

LWisteria · 2020-02-03T05:14:56Z

/usr/local/cuda-10.2/include/CL/cl.h?

ghost · 2020-02-03T06:39:42Z

The path problem has been resolved.

However, there is another issue. The CUDA 10.2 compiler does not accept our carray.clh.
#260 (review)

See Jenkins' log.

fp16.clh:35:10: error: loading directly from pointer to type 'const __attribute__((address_space(16776963))) half' is not allowed
E     return *(const half*)&ret;

carray.clh:559:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
E   static half clpy_nextafter_fp16(half x1, half x2){

vorj · 2020-02-03T06:55:46Z

The path problem has been resolved.

✌️

For Fixstars developers : I edited a build script of jenkins configure to fix this issue, so when we will meet same problem in the future, rewrite ${CUDA_PATH} in the script.

However, there is another issue. The CUDA 10.2 compiler does not accept our carray.clh.

Ugh, so this issue is related to #224, then we need to fix that at first... This is neglect by NVIDIA, isn't it?

LWisteria · 2020-02-04T10:47:18Z

The primary problem of this is updating CUDA version on the CI machine (titanv). It's not caused by ClPy itself.
Therefore, the right solution is to fix the CUDA version for CI.
You may just remove titanv from CI until solved.

One of the ideal solutions could be to modify half/fp16 behavior on ClPy, not just to disable it.
I will make the issue about it.

ybsh · 2020-02-06T05:18:36Z

Installing docker on titanv to run ClPy with cuda-9.2.

ybsh · 2020-02-06T05:41:31Z

Installed a Docker engine (19.03) and nvidia-docker on titanv.
Tested nvidia-smi with the following command and it worked.

# docker run --gpus all nvidia/cuda:9.2-base nvidia-smi

ybsh · 2020-02-14T07:26:33Z

I was careless not to notice this, but with the above command nvidia-smi doesn't seem working on cuda9.2.
It says CUDA Version: 10.2. I haven't found out why.

# docker run --gpus all nvidia/cuda:9.2-base nvidia-smi
Fri Feb 14 07:23:04 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:65:00.0 Off |                  N/A |
| 59%   80C    P2   177W / 250W |   7183MiB / 12064MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

ybsh · 2020-02-21T04:30:39Z

It turned out the desired version had been installed.
The reason nvidia-smi showed v10.2 was that it showed the version returned by its driver API, which is a separate set of interfaces from runtime APIs.

ybsh · 2020-02-21T06:33:46Z

Trying to build clpy with this Dockerfile:

FROM nvidia/cuda:9.2-devel
RUN apt-get update && apt-get install -y \
    clang-6.0 \
    libclang-6.0-dev \
    cmake \
    git \
    python3 \
    python3-pip \
    wget \
    vim
RUN pip3 install \
    cython \
    numpy \
    chainer==3.3.0 \
    pytest
WORKDIR /env
RUN wget https://github.com/CNugteren/CLBlast/archive/1.4.1.tar.gz
RUN tar -zxvf 1.4.1.tar.gz  \
    && rm *.gz \
    && cd CLBlast-1.4.1 \
    && mkdir -p build \
    && cd build \
    && cmake -DCMAKE_BUILD_TYPE=Release .. \
    && make -j8
ENV CLBLAST="/env/CLBlast-1.4.1"
ENV C_INCLUDE_PATH="${CLBLAST}/include:${C_INCLUDE_PATH}"
ENV CPLUS_INCLUDE_PATH="${CLBLAST}/include:${CPLUS_INCLUDE_PATH}"
ENV LIBRARY_PATH="${CLBLAST}/build:${LIBRARY_PATH}"
ENV LD_LIBRARY_PATH="${CLBLAST}/build:${LD_LIBRARY_PATH}"
WORKDIR /app
COPY ./app /app
COPY ./train_mnist.py /app
WORKDIR /app/clpy
RUN sh -c 'python3 setup.py develop 2>&1 | tee build.log'

To run this Dockerfile,

Create a directory foo and place this Dockerfile inside
cd to foo
mkdir -p app and place ClPy directory inside it
Build the image for example with # docker build -t clpy_test .
Run by ````# docker run --gpus all -it -d --name test clpy_test /bin/bash ```
# docker attach clpy_test

ybsh · 2020-02-21T06:34:17Z

With clpy/ and train_mnist.py from Chainer in the context.

ybsh · 2020-02-21T06:54:21Z

$ python3 -m clpy train_mnist.py -g 0
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 174, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.5/runpy.py", line 133, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/lib/python3.5/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/app/clpy/clpy/__init__.py", line 17, in <module>
    from clpy import core  # NOQA
  File "/app/clpy/clpy/core/__init__.py", line 1, in <module>
    from clpy.core import core  # NOQA
  File "clpy/backend/function.pxd", line 4, in init clpy.core.core
  File "/app/clpy/clpy/backend/__init__.py", line 3, in <module>
    from clpy.backend import compiler  # NOQA
  File "clpy/backend/function.pxd", line 4, in init clpy.backend.compiler
  File "clpy/backend/device.pxd", line 4, in init clpy.backend.function
  File "clpy/backend/device.pyx", line 1, in init clpy.backend.device
  File "clpy/backend/opencl/env.pyx", line 82, in init clpy.backend.opencl.env
  File "clpy/backend/opencl/api.pyx", line 17, in clpy.backend.opencl.api.GetPlatformIDs
  File "clpy/backend/opencl/exceptions.pyx", line 24, in clpy.backend.opencl.exceptions.check_status
clpy.backend.opencl.exceptions.OpenCLRuntimeError: UNKNOWN ERROR: -1001

Seems the build had failed...

ybsh · 2020-02-21T06:56:06Z

Here is the build log of ClPy: build.log

ybsh · 2020-02-21T07:00:03Z

I observed at least the build of CLBlast was successful.

ybsh · 2020-02-21T07:01:42Z

At the head of build.log:

readlink: missing operand
Try 'readlink --help' for more information.
dirname: missing operand
Try 'dirname --help' for more information.
nm: '/../lib/libclangTooling.a': No such file
make: 'ultima' is up to date.
make: Nothing to be done for 'build'.
make: Nothing to be done for 'deploy'.
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
building ultima started
building headercvt
launching headercvt (converting cl.h)...
Options: {'linetrace': False, 'profile': False, 'annotate': False}
Include directories: ['/usr/local/cuda/include']
Library directories: ['/usr/local/cuda/lib64']
building without Cython

"building without Cython" does not look normal. Does someone have an instant thought of what mistake I might have made?

ybsh · 2020-02-21T07:10:17Z

nm: '/../lib/libclangTooling.a': No such file

# which clang

It does not know clang's PATH.

ybsh · 2020-02-21T07:27:24Z

FROM nvidia/cuda:9.2-devel

RUN apt-get update && apt-get install -y \
    clang-6.0 \
    libclang-6.0-dev \
    cmake \
    git \
    python3 \
    python3-pip \
    wget \
    vim

RUN pip3 install \
    cython \
    numpy \
    chainer==3.3.0 \
    pytest

WORKDIR /env
RUN wget https://github.com/CNugteren/CLBlast/archive/1.4.1.tar.gz
RUN tar -zxvf 1.4.1.tar.gz  \
    && rm *.gz \
    && cd CLBlast-1.4.1 \
    && mkdir -p build \
    && cd build \
    && cmake -DCMAKE_BUILD_TYPE=Release .. \
    && make -j8

ENV CLBLAST="/env/CLBlast-1.4.1"
ENV C_INCLUDE_PATH="${CLBLAST}/include:${C_INCLUDE_PATH}"
ENV CPLUS_INCLUDE_PATH="${CLBLAST}/include:${CPLUS_INCLUDE_PATH}"
ENV LIBRARY_PATH="${CLBLAST}/build:${LIBRARY_PATH}"
ENV LD_LIBRARY_PATH="${CLBLAST}/build:${LD_LIBRARY_PATH}"

WORKDIR /app
COPY ./app /app
COPY ./train_mnist.py /app

ENV CLANG="/usr/lib/llvm-6.0"
ENV PATH="${CLANG}/bin:${PATH}"
ENV CPLUS_INCLUDE_PATH="${CLANG}/include:${CPLUS_INCLUDE_PATH}"
ENV LIBRARY_PATH="${CLANG}lib:${LIBRARY_PATH}"
ENV LD_LIBRARY_PATH="${CLANG}/lib:${LD_LIBRARY_PATH}"

WORKDIR /app/clpy
RUN sh -c 'python3 setup.py develop 2>&1 | tee build.log' # It might be easier to do this in an interactive shell

As you can see I added clang's PATH and now the environment knows where clang is (confirmed with which), but I get the same runtime error when running train_mnist.py.
This again is the log when I built ClPy:
build.log

LWisteria · 2020-02-21T07:30:58Z

@ybsh I don't understand why you're making Dockerfile. Using an interactive shell seems easier.

ybsh · 2020-02-21T07:51:17Z

@LWisteria I use Dockerfile for automating all the operations before compiling ClPy, and do the compilation by an interactive shell.

vorj · 2020-03-02T03:40:47Z

@ybsh @LWisteria The original problem had been solved by #269 .
Will you continue this issue to create good Dockerfile? (even so I think that it's better to make a new issue for that instead of continuing this issue, though)

ybsh · 2020-03-02T08:45:35Z

@vorj Thank you very much for the fix (which obviated the need to create a separate CUDA 9.2 environment).
I don't think I will, because there is no longer any urgent reason to do that and I'm working on the bottleneck elimination issue #153 .

vorj · 2020-03-02T08:53:43Z

OK, so let's close this.

vorj changed the title ~~_titanv_ CI broken~~ titanv CI broken Feb 3, 2020

LWisteria mentioned this issue Feb 4, 2020

Removing cl_khr_fp16 #264

Closed

vorj mentioned this issue Feb 4, 2020

Check cl_khr_fp16 support #225

Closed

1 task

LWisteria mentioned this issue Feb 6, 2020

Why ClPy is slower than CuPy even if on the same machine? #153

Open

LWisteria added this to the v2.1.0rc2 milestone Feb 22, 2020

vorj closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

titanv CI broken #261

titanv CI broken #261

ghost commented Feb 3, 2020

vorj commented Feb 3, 2020 •

edited

Loading

vorj commented Feb 3, 2020

LWisteria commented Feb 3, 2020

ghost commented Feb 3, 2020 •

edited by ghost

Loading

vorj commented Feb 3, 2020

LWisteria commented Feb 4, 2020

ybsh commented Feb 6, 2020

ybsh commented Feb 6, 2020 •

edited

Loading

ybsh commented Feb 14, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020 •

edited

Loading

LWisteria commented Feb 21, 2020

ybsh commented Feb 21, 2020

vorj commented Mar 2, 2020

ybsh commented Mar 2, 2020

vorj commented Mar 2, 2020

titanv CI broken #261

titanv CI broken #261

Comments

ghost commented Feb 3, 2020

vorj commented Feb 3, 2020 • edited Loading

vorj commented Feb 3, 2020

LWisteria commented Feb 3, 2020

ghost commented Feb 3, 2020 • edited by ghost Loading

vorj commented Feb 3, 2020

LWisteria commented Feb 4, 2020

ybsh commented Feb 6, 2020

ybsh commented Feb 6, 2020 • edited Loading

ybsh commented Feb 14, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 • edited Loading

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 • edited Loading

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020

ybsh commented Feb 21, 2020 • edited Loading

ybsh commented Feb 21, 2020 • edited Loading

LWisteria commented Feb 21, 2020

ybsh commented Feb 21, 2020

vorj commented Mar 2, 2020

ybsh commented Mar 2, 2020

vorj commented Mar 2, 2020

vorj commented Feb 3, 2020 •

edited

Loading

ghost commented Feb 3, 2020 •

edited by ghost

Loading

ybsh commented Feb 6, 2020 •

edited

Loading

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020 •

edited

Loading

ybsh commented Feb 21, 2020 •

edited

Loading