Skip to content

Commit

Permalink
Add CI test for spark and distributed hive. (#1705)
Browse files Browse the repository at this point in the history
Add ci test for spark and distributed hive.

- Fixes #1655
- Fixes #1658
- Fixes #1471 

N.B.: Spark and Hive get stuck when conducting distributed testing in
different containers, whereas it does not happen when they are in the
same container. There may be a configuration issue somewhere, so for
now, we will temporarily put Spark and Hive together. This will not
affect the test results (the vineyard server for communicating with the
work node did receive requests to create and modify data).

Submitting the example program to yarn in the spark container worked fine.

Signed-off-by: vegetableysm <[email protected]>
  • Loading branch information
vegetableysm authored Jan 9, 2024
1 parent 07f84f7 commit 495c342
Show file tree
Hide file tree
Showing 26 changed files with 419 additions and 184 deletions.
75 changes: 64 additions & 11 deletions .github/workflows/java-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,6 @@ jobs:
with:
submodules: recursive

- name: Cache for ccache
uses: actions/cache@v3
with:
path: ~/.ccache
key: ${{ runner.os }}-${{ matrix.metadata }}-ccache-${{ hashFiles('**/git-modules.txt') }}
restore-keys: |
${{ runner.os }}-${{ matrix.metadata }}-ccache-
- name: Install Dependencies for Linux
if: runner.os == 'Linux'
run: |
Expand Down Expand Up @@ -173,12 +165,73 @@ jobs:
# wait for hive docker ready
sleep 60
- name: Hive test
run: |
pushd java/hive/test
./test.sh
popd
- name: Spark with hive test
run: |
pushd java/hive/test
./spark-hive-test.sh
popd
- name: Stop hive docker
run: |
pushd java/hive/docker
docker-compose -f docker-compose.yaml stop
docker-compose -f docker-compose.yaml rm -f
popd
- name: Build mysql container
run: |
pushd java/hive/docker/dependency/mysql
docker-compose -f ./mysql-compose.yaml up -d
popd
- name: Start vineyard server for hive distributed test
run: |
./build/bin/vineyardd --socket=./build/vineyard_sock/metastore/vineyard.sock -rpc_socket_port=18880 --etcd_endpoint="0.0.0.0:2383" &
./build/bin/vineyardd --socket=./build/vineyard_sock/hiveserver/vineyard.sock -rpc_socket_port=18881 --etcd_endpoint="0.0.0.0:2383" &
./build/bin/vineyardd --socket=./build/vineyard_sock/0/vineyard.sock -rpc_socket_port=18882 --etcd_endpoint="0.0.0.0:2383" &
./build/bin/vineyardd --socket=./build/vineyard_sock/1/vineyard.sock -rpc_socket_port=18883 --etcd_endpoint="0.0.0.0:2383" &
./build/bin/vineyardd --socket=./build/vineyard_sock/2/vineyard.sock -rpc_socket_port=18884 --etcd_endpoint="0.0.0.0:2383" &
- name: Build hadoop cluster
run: |
pushd java/hive/docker
docker-compose -f docker-compose-distributed.yaml up -d
popd
# wait for hive docker ready
sleep 60
- name: Hive distributed test
run: |
pushd java/hive/test
./distributed-test.sh
popd
- name: Spark with hive distribued test
run: |
pushd java/hive/test
./spark-hive-distributed-test.sh
popd
- name: Setup tmate session
if: false
uses: mxschmitt/action-tmate@v3

- name: Hive test
- name: Stop container
run: |
pushd java/hive/test
./test.sh
pushd java/hive/docker
docker-compose -f docker-compose-distributed.yaml stop
docker-compose -f docker-compose-distributed.yaml rm -f
popd
pushd java/hive/docker/dependency/mysql
docker-compose -f ./mysql-compose.yaml stop
docker-compose -f ./mysql-compose.yaml rm -f
popd
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ cmake-build-debug
*.whl

# hive sql work directory
/java/hive/distributed/docker/mysql/conf/
/java/hive/distributed/docker/mysql/data/
/java/hive/docker/dependency/mysql/conf/
/java/hive/docker/dependency/mysql/data/

# coredump
core.*.*
Expand Down
43 changes: 14 additions & 29 deletions java/hive/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ Build Hive Docker Image with Hadoop

### Build docker images
```bash
cd v6d/java/hive/distributed
cd v6d/java/hive/docker
./build.sh
```

Expand All @@ -322,25 +322,6 @@ Build Hive Docker Image with Hadoop
# You can change the password in mysql-compose.yaml and hive-site.xml
```

### Run hadoop & hive docker images
```bash
cd v6d/java/hive/docker
docker-compose -f docker-compose-distributed.yaml up -d
```

### Create table
```bash
docker exec -it hive-hiveserver2 beeline -u "jdbc:hive2://hive-hiveserver2:10000" -n root
```

```sql
-- in beeline
drop table test_hive1;
create table test_hive1(field int);
insert into table test_hive1 values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from test_hive1;
```

Using vineyard as storage
-----------------

Expand All @@ -366,23 +347,27 @@ Using vineyard as storage

### Copy vineyard jars to share dir
```bash
mkdir -p ~/share
mkdir -p v6d/share
cd v6d/java/hive
# you can change share dir in docker-compose.yaml
cp target/vineyard-hive-0.1-SNAPSHOT.jar ~/share
cp target/vineyard-hive-0.1-SNAPSHOT.jar ../../../share
```

### Create table with vineyard
### Run hadoop & hive docker images
```bash
cd v6d/java/hive/docker
docker-compose -f docker-compose-distributed.yaml up -d
```

### Create table
```bash
docker exec -it hive-hiveserver2 beeline -u "jdbc:hive2://hive-hiveserver2:10000" -n root
```

```sql
-- in beeline
drop table test_vineyard;
create table test_vineyard(field int)
stored as Vineyard
location "vineyard:///user/hive_remote/warehouse/test_vineyard";
insert into table test_vineyard values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from test_vineyard;
drop table test_hive;
create table test_hive(field int);
insert into table test_hive values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from test_hive;
```
98 changes: 0 additions & 98 deletions java/hive/docker/README.rst

This file was deleted.

26 changes: 25 additions & 1 deletion java/hive/docker/build.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
WORK_DIR=~/hive-workdir
mkdir -p "$WORK_DIR"

find ~/hive-workdir -maxdepth 1 -mindepth 1 ! -name '*.tar.gz' -exec rm -rf {} \;
find ~/hive-workdir -maxdepth 1 -mindepth 1 ! \( -name '*.tar.gz' -o -name '*.tgz' \) -exec rm -rf {} \;

TEZ_VERSION=${TEZ_VERSION:-"0.9.1"}
HIVE_VERSION=${HIVE_VERSION:-"2.3.9"}
SPARK_VERSION=${SPARK_VERSION:-"3.4.1"}

if [ -f "$WORK_DIR/apache-tez-$TEZ_VERSION-bin.tar.gz" ]; then
echo "Tez exists, skipping download..."
Expand Down Expand Up @@ -42,13 +43,36 @@ else
fi
fi

if [ -f "$WORK_DIR/spark-$SPARK_VERSION-bin-hadoop3.tgz" ]; then
echo "Spark exists, skipping download..."
else
echo "Download Spark..."
SPARK_URL=${SPARK_URL:-"https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop3.tgz"}
echo "Downloading Spark from $SPARK_URL..."
if ! curl --fail -L "$SPARK_URL" -o "$WORK_DIR/spark-$SPARK_VERSION-bin-hadoop3.tgz"; then
echo "Failed to download Spark, exiting..."
exit 1
fi
fi

cp -R ./dependency/images/ "$WORK_DIR/"
cp ../target/vineyard-hive-0.1-SNAPSHOT.jar "$WORK_DIR/images/"

tar -xzf "$WORK_DIR/apache-hive-$HIVE_VERSION-bin.tar.gz" -C "$WORK_DIR/"
tar -xzf "$WORK_DIR/apache-tez-$TEZ_VERSION-bin.tar.gz" -C "$WORK_DIR/"
tar -xzf "$WORK_DIR/spark-$SPARK_VERSION-bin-hadoop3.tgz" -C "$WORK_DIR/"
mv "$WORK_DIR/apache-hive-$HIVE_VERSION-bin" "$WORK_DIR/images/hive"
mv "$WORK_DIR/apache-tez-$TEZ_VERSION-bin" "$WORK_DIR/images/tez"
mv "$WORK_DIR/spark-$SPARK_VERSION-bin-hadoop3" "$WORK_DIR/images/spark"

network_name="hadoop-network"

if [[ -z $(docker network ls --filter name=^${network_name}$ --format="{{.Name}}") ]]; then
echo "Docker network ${network_name} does not exist, creating it..."
docker network create hadoop-network
else
echo "Docker network ${network_name} already exists"
fi

docker build \
"$WORK_DIR/images" \
Expand Down
12 changes: 10 additions & 2 deletions java/hive/docker/dependency/images/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,14 @@ COPY hive /opt/apache/hive/
COPY hive-config/ /hive-config
COPY hive-config-distributed/ /hive-config-distributed

ENV PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin:${PATH}
# prepare spark
COPY spark /opt/apache/spark/
ENV SPARK_HOME=/opt/apache/spark
COPY spark-config/ /spark-config
COPY spark-config-distributed/ /spark-config-distributed
COPY ./vineyard-hive-0.1-SNAPSHOT.jar ${SPARK_HOME}/jars/

ENV PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin:${SPARK_HOME}/bin:${PATH}

COPY bootstrap.sh /opt/apache/
COPY mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar ${HIVE_HOME}/lib/
Expand All @@ -35,4 +42,5 @@ RUN sudo yum -y install unzip; \
sudo rm /usr/lib64/libstdc++.so.6; \
sudo ln -s /usr/lib64/libstdc++.so.6.0.26 /usr/lib64/libstdc++.so.6; \
sudo yum -y install vim; \
rm libstdc.so_.6.0.26.zip libstdc++.so.6.0.26;
rm libstdc.so_.6.0.26.zip libstdc++.so.6.0.26; \
sudo yum install -y net-tools
4 changes: 4 additions & 0 deletions java/hive/docker/dependency/images/hive-config/hive-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -112,4 +112,8 @@
<name>hive.metastore.client.connect.retry.delay</name>
<value>5s</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://metastore:9083</value>
</property>
</configuration>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
spark.yarn.stagingDir=file:///tmp/
Loading

0 comments on commit 495c342

Please sign in to comment.