-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve] Replace All Instances of "StreamPark" with "Apache StreamPark" in Official Documentation #322
Closed
Closed
[Improve] Replace All Instances of "StreamPark" with "Apache StreamPark" in Official Documentation #322
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
--- | ||
slug: flink-development-framework-streampark | ||
title: StreamPark - Powerful Flink Development Framework | ||
tags: [StreamPark, DataStream, FlinkSQL] | ||
title: Apache StreamPark™ - Powerful Flink Development Framework | ||
tags: [Apache StreamPark™, DataStream, FlinkSQL] | ||
--- | ||
|
||
Although the Hadoop system is widely used today, its architecture is complicated, it has a high maintenance complexity, version upgrades are challenging, and due to departmental reasons, data center scheduling is prolonged. We urgently need to explore agile data platform models. With the current popularization of cloud-native architecture and the integration between lake and warehous, we have decided to use Doris as an offline data warehouse and TiDB (which is already in production) as a real-time data platform. Furthermore, because Doris has ODBC capabilities on MySQL, it can integrate external database resources and uniformly output reports. | ||
|
@@ -56,18 +56,18 @@ However, because object storage requires the entire object to be rewritten for r | |
|
||
<br/> | ||
|
||
## Introducing StreamPark | ||
## Introducing Apache StreamPark™ | ||
|
||
Previously, when we wrote Flink SQL, we generally used Java to wrap SQL, packed it into a jar package, and submitted it to the S3 platform through the command line. This approach has always been unfriendly; the process is cumbersome, and the costs for development and operations are too high. We hoped to further streamline the process by abstracting the Flink TableEnvironment, letting the platform handle initialization, packaging, and running Flink tasks, and automating the building, testing, and deployment of Flink applications. | ||
|
||
This is an era of open-source uprising. Naturally, we turned our attention to the open-source realm: among numerous open-source projects, after comparing various projects, we found that both Zeppelin and StreamPark provide substantial support for Flink and both claim to support Flink on K8s. Eventually, both were shortlisted for our selection. Here's a brief comparison of their support for K8s (if there have been updates since, please kindly correct). | ||
This is an era of open-source uprising. Naturally, we turned our attention to the open-source realm: among numerous open-source projects, after comparing various projects, we found that both Zeppelin and Apache StreamPark™ provide substantial support for Flink and both claim to support Flink on K8s. Eventually, both were shortlisted for our selection. Here's a brief comparison of their support for K8s (if there have been updates since, please kindly correct). | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<td>Feature</td> | ||
<td>Zeppelin</td> | ||
<td>StreamPark</td> | ||
<td>Apache StreamPark™</td> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
|
@@ -123,15 +123,15 @@ This is an era of open-source uprising. Naturally, we turned our attention to th | |
|
||
<br/> | ||
|
||
During our research process, we communicated with the main developers of both tools multiple times. After our repeated studies and assessments, we eventually decided to adopt StreamPark as our primary Flink development tool for now. | ||
During our research process, we communicated with the main developers of both tools multiple times. After our repeated studies and assessments, we eventually decided to adopt Apache StreamPark™ as our primary Flink development tool for now. | ||
|
||
<video src="http://assets.streamxhub.com/streamx-video.mp4" controls="controls" width="100%" height="100%"></video> | ||
|
||
<center style={{"color": "gray"}}>(StreamPark's official splash screen)</center> | ||
<center style={{"color": "gray"}}>(Apache StreamPark™'s official splash screen)</center> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
|
||
<br/> | ||
|
||
After extended development and testing by our team, StreamPark currently boasts: | ||
After extended development and testing by our team, Apache StreamPark™ currently boasts: | ||
|
||
* Comprehensive <span style={{"color": "red"}}>SQL validation capabilities</span> | ||
* It has achieved <span style={{"color": "red"}}>automatic build/push for images</span> | ||
|
@@ -143,21 +143,21 @@ This effectively addresses most of the challenges we currently face in developme | |
|
||
<video src="http://assets.streamxhub.com/streamx-1.2.0.mp4" controls="controls" width="100%" height="100%"></video> | ||
|
||
<center style={{"color": "gray"}}>(Demo video showcasing StreamPark's support for multiple Flink versions)</center> | ||
<center style={{"color": "gray"}}>(Demo video showcasing Apache StreamPark™'s support for multiple Flink versions)</center> | ||
|
||
<br/> | ||
|
||
In its latest release, version 1.2.0, StreamPark provides robust support for both K8s-Native-Application and K8s-Session-Application modes. | ||
In its latest release, version 1.2.0, Apache StreamPark™ provides robust support for both K8s-Native-Application and K8s-Session-Application modes. | ||
|
||
<video src="http://assets.streamxhub.com/streamx-k8s.mp4" controls="controls" width="100%" height="100%"></video> | ||
|
||
<center style={{"color": "gray"}}>(StreamPark's K8s deployment demo video)</center> | ||
<center style={{"color": "gray"}}>(Apache StreamPark™'s K8s deployment demo video)</center> | ||
|
||
<br/> | ||
|
||
### K8s Native Application Mode | ||
|
||
Within StreamPark, all we need to do is configure the relevant parameters, fill in the corresponding dependencies in the Maven POM, or upload the dependency jar files. Once we click on 'Apply', the specified dependencies will be generated. This implies that we can also compile all the UDFs we use into jar files, as well as various connector.jar files, and use them directly in SQL. As illustrated below: | ||
Within Apache StreamPark™, all we need to do is configure the relevant parameters, fill in the corresponding dependencies in the Maven POM, or upload the dependency jar files. Once we click on 'Apply', the specified dependencies will be generated. This implies that we can also compile all the UDFs we use into jar files, as well as various connector.jar files, and use them directly in SQL. As illustrated below: | ||
|
||
![](/blog/belle/dependency.png) | ||
|
||
|
@@ -169,7 +169,7 @@ We can also specify resources, designate dynamic parameters within Flink Run as | |
|
||
![](/blog/belle/pod.png) | ||
|
||
After saving the program, when clicking to run, we can also specify a savepoint. Once the task is successfully submitted, StreamPark will, based on the FlinkPod's network Exposed Type (be it loadBalancer, NodePort, or ClusterIp), return the corresponding WebURL, seamlessly enabling a WebUI redirect. However, as of now, due to security considerations within our online private K8s cluster, there hasn't been a connection established between the Pod and client node network (and there's currently no plan for this). Hence, we only employ NodePort. If the number of future tasks increases significantly, and there's a need for ClusterIP, we might consider deploying StreamPark in K8s or further integrate it with Ingress. | ||
After saving the program, when clicking to run, we can also specify a savepoint. Once the task is successfully submitted, Apache StreamPark™ will, based on the FlinkPod's network Exposed Type (be it loadBalancer, NodePort, or ClusterIp), return the corresponding WebURL, seamlessly enabling a WebUI redirect. However, as of now, due to security considerations within our online private K8s cluster, there hasn't been a connection established between the Pod and client node network (and there's currently no plan for this). Hence, we only employ NodePort. If the number of future tasks increases significantly, and there's a need for ClusterIP, we might consider deploying Apache StreamPark™ in K8s or further integrate it with Ingress. | ||
|
||
![](/blog/belle/start.png) | ||
|
||
|
@@ -185,7 +185,7 @@ Below is the specific submission process in the K8s Application mode: | |
|
||
### K8s Native Session Mode | ||
|
||
StreamPark also offers robust support for the <span style={{"color": "red"}}> K8s Native-Session mode</span>, which lays a solid technical foundation for our subsequent offline FlinkSQL development or for segmenting certain resources. | ||
Apache StreamPark™ also offers robust support for the <span style={{"color": "red"}}> K8s Native-Session mode</span>, which lays a solid technical foundation for our subsequent offline FlinkSQL development or for segmenting certain resources. | ||
|
||
To use the Native-Session mode, one must first use the Flink command to create a Flink cluster that operates within K8s. For instance: | ||
|
||
|
@@ -203,48 +203,48 @@ To use the Native-Session mode, one must first use the Flink command to create a | |
|
||
![](/blog/belle/flinksql.png) | ||
|
||
As shown in the image above, we use that ClusterId as the Kubernetes ClusterId task parameter for StreamPark. Once the task is saved and submitted, it quickly transitions to a 'Running' state: | ||
As shown in the image above, we use that ClusterId as the Kubernetes ClusterId task parameter for Apache StreamPark™. Once the task is saved and submitted, it quickly transitions to a 'Running' state: | ||
|
||
![](/blog/belle/detail.png) | ||
|
||
Following the application info's WebUI link: | ||
|
||
![](/blog/belle/dashboard.png) | ||
|
||
It becomes evident that StreamPark essentially uploads the jar package to the Flink cluster through REST API and then schedules the task for execution. | ||
It becomes evident that Apache StreamPark™ essentially uploads the jar package to the Flink cluster through REST API and then schedules the task for execution. | ||
|
||
<br/> | ||
|
||
### Custom Code Mode | ||
|
||
To our delight, StreamPark also provides support for coding DataStream/FlinkSQL tasks. For special requirements, we can achieve our implementations in Java/Scala. You can compose tasks following the scaffold method recommended by StreamPark or write a standard Flink task. By adopting this approach, we can delegate code management to git, utilizing the platform for automated compilation, packaging, and deployment. Naturally, if functionality can be achieved via SQL, we would prefer not to customize DataStream, thereby minimizing unnecessary operational complexities. | ||
To our delight, Apache StreamPark™ also provides support for coding DataStream/FlinkSQL tasks. For special requirements, we can achieve our implementations in Java/Scala. You can compose tasks following the scaffold method recommended by Apache StreamPark™ or write a standard Flink task. By adopting this approach, we can delegate code management to git, utilizing the platform for automated compilation, packaging, and deployment. Naturally, if functionality can be achieved via SQL, we would prefer not to customize DataStream, thereby minimizing unnecessary operational complexities. | ||
|
||
<br/><br/> | ||
|
||
# 4. Feedback and Future Directions | ||
|
||
## Suggestions for Improvement | ||
|
||
StreamPark, similar to any other new tools, does have areas for further enhancement based on our current evaluations: | ||
Apache StreamPark™, similar to any other new tools, does have areas for further enhancement based on our current evaluations: | ||
|
||
* **Strengthening Resource Management**: Features like multi-file system jar resources and robust task versioning are still awaiting additions. | ||
* **Enriching Frontend Features**: For instance, once a task is added, functionalities like copying could be integrated. | ||
* **Visualization of Task Submission Logs**: The process of task submission involves loading class files, jar packaging, building and submitting images, and more. A failure at any of these stages could halt the task. However, error logs are not always clear, or due to some anomaly, the exceptions aren't thrown as expected, leaving users puzzled about rectifications. | ||
|
||
It's a universal truth that innovations aren't perfect from the outset. Although minor issues exist and there are areas for improvement with StreamPark, its merits outweigh its limitations. As a result, we've chosen StreamPark as our Flink DevOps platform. We're also committed to collaborating with its main developers to refine StreamPark further. We wholeheartedly invite others to use it and contribute towards its advancement. | ||
It's a universal truth that innovations aren't perfect from the outset. Although minor issues exist and there are areas for improvement with Apache StreamPark™, its merits outweigh its limitations. As a result, we've chosen Apache StreamPark™ as our Flink DevOps platform. We're also committed to collaborating with its main developers to refine Apache StreamPark™ further. We wholeheartedly invite others to use it and contribute towards its advancement. | ||
|
||
<br/> | ||
|
||
## Future Prospects | ||
|
||
* We'll keep our focus on Doris and plan to unify business data with log data in Doris, leveraging Flink to realize lakehouse capabilities. | ||
* Our next step is to explore integrating StreamPark with DolphinScheduler 2.x. This would enhance DolphinScheduler's offline tasks, and gradually we aim to replace Spark with Flink for a unified batch-streaming solution. | ||
* Our next step is to explore integrating Apache StreamPark™ with DolphinScheduler 2.x. This would enhance DolphinScheduler's offline tasks, and gradually we aim to replace Spark with Flink for a unified batch-streaming solution. | ||
* Drawing from our own experiments with S3, after building the fat-jar, we're considering bypassing image building. Instead, we'll mount PVC directly to the Flink Pod's directory using Pod Template, refining the code submission process even further. | ||
* We plan to persistently implement StreamPark in our production environment. Collaborating with community developers, we aim to boost StreamPark's Flink stream development, deployment, and monitoring capabilities. Our collective vision is to evolve StreamPark into a holistic stream data DevOps platform. | ||
* We plan to persistently implement Apache StreamPark™ in our production environment. Collaborating with community developers, we aim to boost Apache StreamPark™'s Flink stream development, deployment, and monitoring capabilities. Our collective vision is to evolve Apache StreamPark™ into a holistic stream data DevOps platform. | ||
|
||
Resources: | ||
|
||
StreamPark GitHub: [https://github.com/apache/incubator-streampark](https://github.com/apache/incubator-streampark) <br/> | ||
Apache StreamPark™ GitHub: [https://github.com/apache/incubator-streampark](https://github.com/apache/incubator-streampark) <br/> | ||
Doris GitHub: [https://github.com/apache/doris](https://github.com/apache/doris) | ||
|
||
![](/blog/belle/author.png) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes here, keep it as is