diff --git a/blog/7-streampark-usercase-haibo.md b/blog/7-streampark-usercase-haibo.md index 6e80ef2c5..5badbe01a 100644 --- a/blog/7-streampark-usercase-haibo.md +++ b/blog/7-streampark-usercase-haibo.md @@ -1,103 +1,103 @@ --- slug: streampark-usercase-haibo -title: StreamPark 一站式计算利器在海博科技的生产实践,助力智慧城市建设 -tags: [StreamPark, 生产实践, FlinkSQL] +title: An All-in-One Computation Tool in Haibo Tech's Production Practice and facilitation in Smart City Construction +tags: [StreamPark, Production Practice, FlinkSQL] --- -**摘要:**本文「 StreamPark 一站式计算利器在海博科技的生产实践,助力智慧城市建设 」作者是海博科技大数据架构师王庆焕,主要内容为: +**Summary:** The author of this article, "StreamPark: An All-in-One Computation Tool in Haibo Tech's Production Practice and facilitation in Smart City Construction," is the Big Data Architect at Haibo Tech. The main topics covered include: -1. 选择 StreamPark -2. 快速上手 -3. 应用场景 -4. 功能扩展 -5. 未来期待 +1. Choosing StreamPark +2. Getting Started Quickly +3. Application Scenarios +4. Feature Extensions +5. Future Expectations -海博科技是一家行业领先的人工智能物联网产品和解决方案公司。目前在公共安全、智慧城市、智慧制造领域,为全国客户提供包括算法、软件和硬件产品在内的全栈式整体解决方案。 +Haibo Tech is an industry-leading company offering AI IoT products and solutions. Currently, they provide full-stack solutions, including algorithms, software, and hardware products, to clients nationwide in public safety, smart cities, and smart manufacturing domains. -## **01. 选择 StreamPark** +## **01. Choosing StreamPark** -海博科技自 2020 年开始使用 Flink SQL 汇聚、处理各类实时物联数据。随着各地市智慧城市建设步伐的加快,需要汇聚的各类物联数据的数据种类、数据量也不断增加,导致线上维护的 Flink SQL 任务越来越多,一个专门的能够管理众多 Flink SQL 任务的计算平台成为了迫切的需求。 +Haibo Tech started using Flink SQL to aggregate and process various real-time IoT data since 2020. With the accelerated pace of smart city construction in various cities, the types and volume of IoT data to be aggregated are also increasing. This has resulted in an increasing number of Flink SQL tasks being maintained online, making a dedicated platform for managing numerous Flink SQL tasks an urgent need. -在体验对比了 Apache Zeppelin 和 StreamPark 之后,我们选择了 StreamPark 作为公司的实时计算平台。相比 Apache Zeppelin, StreamPark 并不出名。‍‍‍‍‍‍‍‍‍‍‍‍但是在体验了 StreamPark 发行的初版,阅读其设计文档后,我们发现其基于 **一站式** 设计的思想,能够覆盖 Flink 任务开发的全生命周期,使得配置、开发、部署、运维全部在一个平台即可完成。我们的开发、运维、测试的同学可以使用 StreamPark 协同工作,**低代码** + **一站式** 的设计思想坚定了我们使用 StreamPark 的信心。 +After evaluating Apache Zeppelin and StreamPark, we chose StreamPark as our real-time computing platform. Compared to Apache Zeppelin, StreamPark may not be as well-known. However, after experiencing the initial release of StreamPark and reading its design documentation, we recognized that its all-in-one design philosophy covers the entire lifecycle of Flink task development. This means that configuration, development, deployment, and operations can all be accomplished on a single platform. Our developers, operators, and testers can collaboratively work on StreamPark. The **low-code** + **all-in-one** design principles solidified our confidence in using StreamPark. -//视频链接( StreamX 官方闪屏) +// Video link (streampark official video) -## **02. 落地实践** +## **02. Practical Implementation** -### **1. 快速上手** +### **1. Quick Start** -使用 StreamPark 完成一个实时汇聚任务就像把大象放进冰箱一样简单,仅需三步即可完成: +Using StreamPark to accomplish a real-time aggregation task is as simple as putting an elephant into a fridge, and it can be done in just three steps: -- 编辑 SQL +- Edit SQL ![](/blog/haibo/flink_sql.png) -- 上传依赖包 +- Upload dependency packages ![](/blog/haibo/dependency.png) -- 部署运行 +- Deploy and run ![](/blog/haibo/deploy.png) -仅需上述三步,即可完成 Mysql 到 Elasticsearch 的汇聚任务,大大提升数据接入效率。 +With just the above three steps, you can complete the aggregation task from Mysql to Elasticsearch, significantly improving data access efficiency. -### **2. 生产实践** +### **2. Production Practice** -StreamPark 在海博主要用于运行实时 Flink SQL任务: 读取 Kafka 上的数据,进行处理输出至 Clickhouse 或者 Elasticsearch 中。 +StreamPark is primarily used at Haibo for running real-time Flink SQL tasks: reading data from Kafka, processing it, and outputting to Clickhouse or Elasticsearch. -从2021年10月开始,公司逐渐将 Flink SQL 任务迁移至 StreamPark 平台来集中管理,承载我司实时物联数据的汇聚、计算、预警。 +Starting from October 2021, the company gradually migrated Flink SQL tasks to the StreamPark platform for centralized management. It supports the aggregation, computation, and alerting of our real-time IoT data. -截至目前,StreamPark 已在多个政府、公安生产环境进行部署,汇聚处理城市实时物联数据、人车抓拍数据。以下是在某市专网部署的 StreamPark 平台截图 : +As of now, StreamPark has been deployed in various government and public security production environments, aggregating and processing real-time IoT data, as well as capturing data on people and vehicles. Below is a screenshot of the StreamPark platform deployed on a city's dedicated network: ![](/blog/haibo/application.png) -## **03. 应用场景** +## **03. Application Scenarios** -#### **1. 实时物联感知数据汇聚** +#### **1. Real-time IoT Sensing Data Aggregation** -汇聚实时的物联感知数据,我们直接使用 StreamPark 开发 Flink SQL 任务,针对 Flink SQL 未提供的方法,StreamPark 也支持 Udf 相关功能,用户通过 StreamPark 上传 Udf 包,即可在 SQL 中调用相关 Udf,实现更多复杂的逻辑操作。 +For aggregating real-time IoT sensing data, we directly use StreamPark to develop Flink SQL tasks. For methods not provided by Flink SQL, StreamPark also supports UDF-related functionalities. Users can upload UDF packages through StreamPark, and then call the relevant UDF in SQL to achieve more complex logical operations. -“SQL+UDF” 的方式,能够满足我们绝大部分的数据汇聚场景,如果后期业务变动,也只需要在 StreamPark 中修改 SQL 语句,即可完成业务变更与上线。 +The "SQL+UDF" approach meets most of our data aggregation scenarios. If business changes in the future, we only need to modify the SQL statement in StreamPark to complete business changes and deployment. ![](/blog/haibo/data_aggregation.png) -#### **2. Flink CDC数据库同步** +#### **2. Flink CDC Database Synchronization** -为了实现各类数据库与数据仓库之前的同步,我们使用 StreamPark 开发 Flink CDC SQL 任务。借助于 Flink CDC 的能力,实现了 Oracle 与 Oracle 之间的数据同步, Mysql/Postgresql 与 Clickhouse 之间的数据同步。 +To achieve synchronization between various databases and data warehouses, we use StreamPark to develop Flink CDC SQL tasks. With the capabilities of Flink CDC, we've implemented data synchronization between Oracle and Oracle, as well as synchronization between Mysql/Postgresql and Clickhouse. ![](/blog/haibo/flink_cdc.png) -**3. 数据分析模型管理** +**3. Data Analysis Model Management** -针对无法使用 Flink SQL 需要开发 Flink 代码的任务,例如: 实时布控模型、离线数据分析模型,StreamPark 提供了 Custom code 的方式, 允许用户上传可执行的 Flink Jar 包并运行。 +For tasks that can't use Flink SQL and need Flink code development, such as real-time control models and offline data analysis models, StreamPark offers a Custom code approach, allowing users to upload executable Flink Jar packages and run them. -目前,我们已经将人员,车辆等 20 余类分析模型上传至 StreamPark,交由 StreamPark 管理运行。 +Currently, we have uploaded over 20 analysis models, such as personnel and vehicles, to StreamPark, which manages and operates them. ![](/blog/haibo/data_aggregation.png) -**综上:** 无论是 Flink SQL 任务还是 Custome code 任务,StreamPark 均提供了很好的支持,满足各种不同的业务场景。 但是 StreamPark 缺少任务调度的能力,如果你需要定期调度任务, StreamPark 目前无法满足。社区成员正在努力开发调度相关的模块,在即将发布的 1.2.3 中 会支持任务调度功能,敬请期待。 +**In Summary:** Whether it's Flink SQL tasks or Custom code tasks, StreamPark provides excellent support to meet various business scenarios. However, StreamPark lacks task scheduling capabilities. If you need to schedule tasks regularly, StreamPark currently cannot meet this need. Community members are actively developing scheduling-related modules, and the soon-to-be-released version 1.2.3 will support task scheduling capabilities, so stay tuned. -## **04. 功能扩展** +## **04. Feature Extension** -Datahub 是 Linkedin 开发的一个元数据管理平台,提供了数据源管理、数据血缘、数据质量检查等功能。海博科技基于 StreamPark 和 Datahub 进行了二次开发,实现了数据表级/字段级的血缘功能。通过数据血缘功能,帮助用户检查 Flink SQL 的字段血缘关系。并将血缘关系保存至Linkedin/Datahub 元数据管理平台。 +Datahub is a metadata management platform developed by Linkedin, offering data source management, data lineage, data quality checks, and more. Haibo Tech has developed an extension based on StreamPark and Datahub, implementing table-level/field-level lineage features. With the data lineage feature, users can check the field lineage relationship of Flink SQL and save the lineage relationship to the Linkedin/Datahub metadata management platform. -//两个视频链接(基于 StreamX 开发的数据血缘功能) +// Two video links (Data lineage feature developed based on streampark) -## **05. 未来期待** +## **05. Future Expectations** -目前,StreamPark 社区的 Roadmap 显示 StreamPark 1.3.0 将迎来全新的 Workbench 体验、统一的资源管理中心 (JAR/UDF/Connectors 统一管理)、批量任务调度等功能。这也是我们非常期待的几个全新功能。 +Currently, the StreamPark community's Roadmap indicates that StreamPark 1.3.0 will usher in a brand new Workbench experience, a unified resource management center (unified management of JAR/UDF/Connectors), batch task scheduling, and more. These are also some of the brand-new features we are eagerly anticipating. -Workbench 将使用全新的工作台式的 SQL 开发风格,选择数据源即可生成 SQL,进一步提升 Flink 任务开发效率。统一的 UDF 资源中心将解决当前每个任务都要上传依赖包的问题。批量任务调度功能将解决当前 StreamPark 无法调度任务的遗憾。 +The Workbench will use a new workbench-style SQL development style. By selecting a data source, SQL can be generated automatically, further enhancing Flink task development efficiency. The unified UDF resource center will solve the current problem where each task has to upload its dependency package. The batch task scheduling feature will address StreamPark's current inability to schedule tasks. -下图是 StreamPark 开发者设计的原型图,敬请期待。 +Below is a prototype designed by StreamPark developers, so please stay tuned. ![](/blog/haibo/data_source.png) diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/7-streampark-usercase-haibo.md b/i18n/zh-CN/docusaurus-plugin-content-blog/7-streampark-usercase-haibo.md index 8a5e0041e..d8713a1f6 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-blog/7-streampark-usercase-haibo.md +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/7-streampark-usercase-haibo.md @@ -22,7 +22,7 @@ tags: [StreamPark, 生产实践, FlinkSQL] 在体验对比了 Apache Zeppelin 和 StreamPark 之后,我们选择了 StreamPark 作为公司的实时计算平台。相比 Apache Zeppelin, StreamPark 并不出名。‍‍‍‍‍‍‍‍‍‍‍‍但是在体验了 StreamPark 发行的初版,阅读其设计文档后,我们发现其基于 **一站式** 设计的思想,能够覆盖 Flink 任务开发的全生命周期,使得配置、开发、部署、运维全部在一个平台即可完成。我们的开发、运维、测试的同学可以使用 StreamPark 协同工作,**低代码** + **一站式** 的设计思想坚定了我们使用 StreamPark 的信心。 -//视频链接( StreamX 官方闪屏) +//视频链接( StreamX 官方视频)