Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New versions 3.4.1 and 3.5.0 #291

Merged
merged 14 commits into from
Oct 18, 2023
Merged
11 changes: 9 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ All notable changes to this project will be documented in this file.
- Default stackableVersion to operator version. It is recommended to remove `spec.image.stackableVersion` from your custom resources ([#267], [#268]).
- Configuration overrides for the JVM security properties, such as DNS caching ([#272]).
- Support PodDisruptionBudgets for HistoryServer ([#288]).
- Support for versions 3.4.1, 3.5.0 ([#291]).
- History server now exports metrics via jmx exporter (port 18081) ([#291]).

### Changed

- `vector` `0.26.0` -> `0.31.0` ([#269]).
- `operator-rs` `0.44.0` -> `0.52.1` ([#267], [#275], [#288]).
- `vector` `0.26.0` -> `0.33.0` ([#269], [#291]).
- `operator-rs` `0.44.0` -> `0.55.0` ([#267], [#275], [#288], [#291]).
- Removed usages of SPARK_DAEMON_JAVA_OPTS since it's not a reliable way to pass extra JVM options ([#272]).
- [BREAKING] use product image selection instead of version ([#275]).
- [BREAKING] refactored application roles to use `CommonConfiguration` structures from the operator framework ([#277]).
Expand All @@ -23,6 +25,10 @@ All notable changes to this project will be documented in this file.

- Dynamic loading of Maven packages ([#281]).

### Removed

- Removed support for versions 3.2.1, 3.3.0 ([#291]).

[#267]: https://github.com/stackabletech/spark-k8s-operator/pull/267
[#268]: https://github.com/stackabletech/spark-k8s-operator/pull/268
[#269]: https://github.com/stackabletech/spark-k8s-operator/pull/269
Expand All @@ -32,6 +38,7 @@ All notable changes to this project will be documented in this file.
[#281]: https://github.com/stackabletech/spark-k8s-operator/pull/281
[#286]: https://github.com/stackabletech/spark-k8s-operator/pull/286
[#288]: https://github.com/stackabletech/spark-k8s-operator/pull/288
[#291]: https://github.com/stackabletech/spark-k8s-operator/pull/291

## [23.7.0] - 2023-07-14

Expand Down
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
snafu = "0.7"
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag = "0.52.1" }
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag = "0.55.0" }
strum = { version = "0.25", features = ["derive"] }
tokio = { version = "1.29", features = ["full"] }
tracing = "0.1"
Expand Down
80 changes: 60 additions & 20 deletions deploy/helm/spark-k8s-operator/crds/crds.yaml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/modules/spark-k8s/examples/example-encapsulated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0 # <1>
productVersion: 3.5.0 # <1>
mode: cluster
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: /stackable/spark/examples/jars/spark-examples.jar # <2>
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/spark-k8s/examples/example-history-app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
pullPolicy: IfNotPresent
mode: cluster
mainClass: org.apache.spark.examples.SparkPi
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
name: spark-history
spec:
image:
productVersion: 3.3.0
productVersion: 3.5.0
logFileDirectory: # <1>
s3:
prefix: eventlogs/ # <2>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: s3a://stackable-spark-k8s-jars/jobs/ny-tlc-report-1.1.0.jar # <3>
mainClass: tech.stackable.demo.spark.NYTLCReport
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: s3a://stackable-spark-k8s-jars/jobs/ny_tlc_report.py # <1>
args:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
version: "1.0"
image: docker.stackable.tech/stackable/ny-tlc-report:0.1.0 # <1>
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: local:///stackable/spark/jobs/ny_tlc_report.py # <2>
args:
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/spark-k8s/examples/example-sparkapp-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: s3a://stackable-spark-k8s-jars/jobs/ny-tlc-report-1.0-SNAPSHOT.jar # <1>
mainClass: org.example.App # <2>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: s3a://my-bucket/spark-examples.jar # <1>
mainClass: org.apache.spark.examples.SparkPi # <2>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: local:///stackable/spark/examples/src/main/python/streaming/hdfs_wordcount.py
args:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: local:///stackable/spark/examples/src/main/python/pi.py
driver:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: local:///stackable/spark/examples/src/main/python/pi.py
driver:
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/spark-k8s/pages/crd-reference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Below are listed the CRD fields that can be defined by the user:
|User-supplied image containing spark-job dependencies that will be copied to the specified volume mount

|`spec.sparkImage`
| Spark image which will be deployed to driver and executor pods, which must contain spark environment needed by the job e.g. `docker.stackable.tech/stackable/spark-k8s:3.3.0-stackable0.3.0`
| Spark image which will be deployed to driver and executor pods, which must contain spark environment needed by the job e.g. `docker.stackable.tech/stackable/spark-k8s:3.5.0-stackable0.0.0-dev`

|`spec.sparkImagePullPolicy`
| Optional Enum (one of `Always`, `IfNotPresent` or `Never`) that determines the pull policy of the spark job image
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ For a role group of the Spark history server, you can specify: `configOverrides`

The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.

The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.4.0, Apache Spark may perform poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.4.0, Apache Spark may perform poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:

[source,yaml]
----
Expand Down
5 changes: 3 additions & 2 deletions docs/modules/spark-k8s/partials/supported-versions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
// Stackable Platform documentation.
// Please sort the versions in descending order (newest first)

- 3.4.0 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 11)
- 3.3.0 (Hadoop 3.3.3, Scala 2.12, Python 3.9, Java 11)
- 3.5.0 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 11)
- 3.4.1 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 11)
- 3.4.0 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 11) (deprecated)
4 changes: 2 additions & 2 deletions examples/README-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ Several resources are needed in this store. These can be loaded like this:

````text
kubectl exec minio-mc-0 -- sh -c 'mc alias set test-minio http://test-minio:9000/'
kubectl cp examples/ny-tlc-report-1.1.0-3.3.0.jar minio-mc-0:/tmp
kubectl cp examples/ny-tlc-report-1.1.0-3.5.0.jar minio-mc-0:/tmp
kubectl cp apps/ny_tlc_report.py minio-mc-0:/tmp
kubectl cp examples/yellow_tripdata_2021-07.csv minio-mc-0:/tmp
kubectl exec minio-mc-0 -- mc cp /tmp/ny-tlc-report-1.1.0-3.3.0.jar test-minio/my-bucket
kubectl exec minio-mc-0 -- mc cp /tmp/ny-tlc-report-1.1.0-3.5.0.jar test-minio/my-bucket
kubectl exec minio-mc-0 -- mc cp /tmp/ny_tlc_report.py test-minio/my-bucket
kubectl exec minio-mc-0 -- mc cp /tmp/yellow_tripdata_2021-07.csv test-minio/my-bucket
````
Expand Down
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/ny-tlc-report-external-dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
pullPolicy: IfNotPresent
mode: cluster
mainApplicationFile: s3a://my-bucket/ny_tlc_report.py
Expand Down
2 changes: 1 addition & 1 deletion examples/ny-tlc-report-image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
version: "1.0"
# everything under /jobs will be copied to /stackable/spark/jobs
image: docker.stackable.tech/stackable/ny-tlc-report:0.1.0
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.3.0-stackable0.0.0-dev
sparkImage: docker.stackable.tech/stackable/spark-k8s:3.5.0-stackable0.0.0-dev
sparkImagePullPolicy: IfNotPresent
mode: cluster
mainApplicationFile: local:///stackable/spark/jobs/ny_tlc_report.py
Expand Down
4 changes: 2 additions & 2 deletions examples/ny-tlc-report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ metadata:
spec:
version: "1.0"
sparkImage:
productVersion: 3.3.0
productVersion: 3.5.0
mode: cluster
mainApplicationFile: s3a://my-bucket/ny-tlc-report-1.1.0-3.3.0.jar
mainApplicationFile: s3a://my-bucket/ny-tlc-report-1.1.0-3.5.0.jar
mainClass: tech.stackable.demo.spark.NYTLCReport
volumes:
- name: cm-job-arguments
Expand Down
2 changes: 1 addition & 1 deletion rust/crd/src/affinity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ mod test {
name: spark-history
spec:
image:
productVersion: 3.3.0
productVersion: 3.5.0
logFileDirectory:
s3:
prefix: eventlogs/
Expand Down
24 changes: 18 additions & 6 deletions rust/operator-binary/src/history/history_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ use stackable_operator::{
builder::{ConfigMapBuilder, ContainerBuilder, ObjectMetaBuilder, PodBuilder, VolumeBuilder},
cluster_resources::{ClusterResourceApplyStrategy, ClusterResources},
commons::product_image_selection::ResolvedProductImage,
duration::Duration,
k8s_openapi::{
api::{
apps::v1::{StatefulSet, StatefulSetSpec},
Expand All @@ -31,6 +30,7 @@ use stackable_operator::{
},
},
role_utils::RoleGroupRef,
time::Duration,
};
use stackable_spark_k8s_crd::{
constants::{
Expand All @@ -55,6 +55,8 @@ use stackable_operator::k8s_openapi::DeepMerge;
use stackable_operator::logging::controller::ReconcilerError;
use strum::{EnumDiscriminants, IntoStaticStr};

const METRICS_PORT: u16 = 18081;

#[derive(Snafu, Debug, EnumDiscriminants)]
#[strum_discriminants(derive(IntoStaticStr))]
#[allow(clippy::enum_variant_names)]
Expand Down Expand Up @@ -415,6 +417,7 @@ fn build_stateful_set(
.command(vec!["/bin/bash".to_string()])
.args(command_args(s3_log_dir))
.add_container_port("http", 18080)
.add_container_port("metrics", METRICS_PORT.into())
.add_env_vars(env_vars(s3_log_dir))
.add_volume_mounts(s3_log_dir.volume_mounts())
.add_volume_mount(VOLUME_MOUNT_NAME_CONFIG, VOLUME_MOUNT_PATH_CONFIG)
Expand Down Expand Up @@ -515,15 +518,23 @@ fn build_service(
.ownerreference_from_resource(shs, None, Some(true))
.context(ObjectMissingMetadataForOwnerRefSnafu)?
.with_recommended_labels(labels(shs, app_version_label, &group_name))
.with_label("prometheus.io/scrape", "true")
.build(),
spec: Some(ServiceSpec {
type_: Some(service_type),
cluster_ip: service_cluster_ip,
ports: Some(vec![ServicePort {
name: Some(String::from("http")),
port: 18080,
..ServicePort::default()
}]),
ports: Some(vec![
ServicePort {
name: Some(String::from("http")),
port: 18080,
..ServicePort::default()
},
ServicePort {
name: Some(String::from("metrics")),
port: METRICS_PORT.into(),
..ServicePort::default()
},
]),
selector: Some(selector),
..ServiceSpec::default()
}),
Expand Down Expand Up @@ -634,6 +645,7 @@ fn env_vars(s3logdir: &S3LogDir) -> Vec<EnvVar> {
format!(
"-Djava.security.properties={VOLUME_MOUNT_PATH_CONFIG}/{JVM_SECURITY_PROPERTIES_FILE}"
),
format!("-javaagent:/stackable/jmx/jmx_prometheus_javaagent.jar={METRICS_PORT}:/stackable/jmx/config.yaml")
];
if tlscerts::tls_secret_name(&s3logdir.bucket.connection).is_some() {
history_opts.extend(
Expand Down
4 changes: 2 additions & 2 deletions rust/operator-binary/src/pod_driver_controller.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use stackable_operator::{
client::Client, duration::Duration, k8s_openapi::api::core::v1::Pod,
kube::runtime::controller::Action,
client::Client, k8s_openapi::api::core::v1::Pod, kube::runtime::controller::Action,
time::Duration,
};
use stackable_spark_k8s_crd::{
constants::POD_DRIVER_CONTROLLER_NAME, SparkApplication, SparkApplicationStatus,
Expand Down
2 changes: 1 addition & 1 deletion rust/operator-binary/src/spark_k8s_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use std::{
vec,
};

use stackable_operator::{duration::Duration, product_config::writer::to_java_properties_string};
use stackable_operator::{product_config::writer::to_java_properties_string, time::Duration};
use stackable_spark_k8s_crd::{
constants::*, s3logdir::S3LogDir, tlscerts, RoleConfig, SparkApplication, SparkApplicationRole,
SparkContainer, SubmitConfig,
Expand Down
18 changes: 6 additions & 12 deletions tests/README-templating.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,11 @@ An example of the content for the test definition file is shown here:
dimensions:
- name: spark
values:
- 3.2.1
- 3.2.2
- 3.2.3
- 3.4.1
- 3.5.0
- name: hadoop
values:
- 3.1.0
- 3.2.0
- 3.3.4
- name: aws
- abc
- xyz
Expand All @@ -39,12 +37,8 @@ In this example the test case uses only two of the three dimensions defined, so

````text
└── spark-pi-public-s3
├── spark-3.2.1_hadoop-3.1.0
├── spark-3.2.1_hadoop-3.2.0
├── spark-3.2.2_hadoop-3.1.0
├── spark-3.2.2_hadoop-3.2.0
├── spark-3.2.3_hadoop-3.1.0
└── spark-3.2.3_hadoop-3.2.0
├── spark-3.4.1_hadoop-3.3.4
├── spark-3.5.0_hadoop-3.3.4
````

The name of a test case defined under `tests` in this file has to refer back to a directory in the `templates/kuttl` directory, which will be used to create the test scenarios.
Expand All @@ -61,7 +55,7 @@ tests
````

The `kuttl-test.yaml.jinja2` cannot currently be edited, as it comes from the operator templating and any changes would be overwritten again.
This should be fairly easy to solve and we can look at this as soon as it becomes necessary.
This should be fairly easy to solve, and we can look at this as soon as it becomes necessary.

## Using

Expand Down
5 changes: 0 additions & 5 deletions tests/templates/kuttl/iceberg/10-assert.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,4 @@ kind: SparkApplication
metadata:
name: pyspark-iceberg
status:
{% if test_scenario['values']['spark'].startswith("3.3") %}
# Spark 3.3 is expected to fail because of this https://issues.apache.org/jira/browse/SPARK-35084
phase: Failed
{% else %}
phase: Succeeded
{% endif %}
3 changes: 2 additions & 1 deletion tests/templates/kuttl/iceberg/10-deploy-spark-app.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ spec:
mountPath: /stackable/spark/jobs
deps:
packages:
- org.apache.iceberg:iceberg-spark-runtime-{{ test_scenario['values']['spark'].rstrip('.0') }}_2.12:1.3.1
# need to extract only the major and minor versions
- org.apache.iceberg:iceberg-spark-runtime-{{ test_scenario['values']['spark'].rsplit('.', maxsplit=1)[0] }}_2.12:1.4.0
volumes:
- name: script
configMap:
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed tests/templates/kuttl/smoke/spark-examples_3.3.0.jar
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading