-
Notifications
You must be signed in to change notification settings - Fork 7
/
feature.yaml
370 lines (345 loc) · 22.6 KB
/
feature.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
- title: RAG Monitoring AMP
description: |-
Build a monitoring dashboard over a RAG (Retrieval Augmented Generation) system. The dashboard should be able to monitor the model's performance and provide insights into the model's behavior. The AMP uses either AWS Bedrock Models or Cloudera AI Inferencing for indexing, response generation, and evaluation.
To give it a try, simply navigate to the AMP Catalog and select the new RAG Monitoring AMP.
category: Feature
isNew: true
icon: announcement
tags:
- Verta
- Monitoring
- Evaluation
- Bedrock
- MLflow
- Qdrant
- Streamlit
- FastAPI
- RAG
- AI
- LLM
link: "https://github.com/cloudera/CML_AMP_RAG_Monitoring"
date: '2024-12-11T00:00:00Z'
targetAudiences: ["PC"]
- title: RAG Studio AMP
description: |-
RAG Studio is a no-code application built on the Cloudera platform that enables you to create RAG chatbots powered by your enterprise data in minutes. Designed for accessibility, it bridges the gap between business and IT teams, driving collaboration in AI projects.
To give it a try, simply navigate to the AMP Catalog and select the new RAG Studio AMP.
category: Feature
isNew: true
icon: announcement
tags:
- Verta
- OpenAI
- GenAI
- Prompts
- Python
- RAG
- LLM
link: "https://github.com/cloudera/CML_AMP_RAG_Studio"
date: '2024-12-11T00:00:00Z'
targetAudiences: ["PC"]
- title: Chat with your documents AMP
description: |-
Chat with your documents AMP demonstrates using an open source pre-trained instruction-following LLM (Large Language Model) to build a ChatBot-like web application. The responses of the LLM are enhanced by giving it context from an internal knowledge base. This context is retrieved by using an open-source Vector Database for semantic search.
To try it, navigate to the AMP Catalog and select the new Chat with your documents AMP.
category: Feature
isNew: true
icon: announcement
tags:
- GenAI
- RAG
- Milvus
- Python
- LLM
- Llama Index
link: https://github.com/cloudera/LlamaIndex_IN_CML_AMP
date: '2024-10-03T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Fine Tuning Studio AMP
description: |-
Fine Tuning Studio is a one-stop-shop for managing, training, evaluating, and deploying large language models. Train models of any size, on any dataset, to fine tune your models for domain-specific use cases. Fine Tuning Studio comes with a powerful, customizable training and evaluating system that can be accessed both though a UI, and through a Python-accessible client.
To give it a try, simply navigate to the AMP Catalog and select the new Fine Tuning Studio AMP.
category: Feature
isNew: true
icon: announcement
tags:
- GenAI
- LLM
- Fine Tuning
- QLoRA
- Adapters
- Evaluation
- Finetuning
- Prompts
- Python
link: https://github.com/cloudera/CML_AMP_LLM_Fine_Tuning_Studio
date: '2024-10-03T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: AMP - Knowledge Graph powered RAG based QA application
description: |-
This AMP spins up a knowledge graph powered RAG application which has the capability to answer AI/ML questions drawing from the latest research publications. The knowledge base consists of ~650 AI/ML papers from arXiv, and the citation relationships between them are captured as "edges" in the knowledge graph, which is powered by Neo4j. Additional information from knowledge graph is used to better rerank text chunks retrieved by vector search and also suggest related papers from the answer to assist user's research.
To give it a try, simply navigate to the AMP Catalog and select the new Knowledge Graph AMP. The AMP requires GPU access and a Hugging Face API token.
descriptionHtml: |-
This AMP spins up a knowledge graph powered RAG application which has the capability to answer AI/ML questions drawing from the latest research publications. The knowledge base consists of ~650 AI/ML papers from arXiv, and the citation relationships between them are captured as "edges" in the knowledge graph, which is powered by Neo4j. Additional information from knowledge graph is used to better rerank text chunks retrieved by vector search and also suggest related papers from the answer to assist user's research.
To give it a try, simply navigate to the AMP Catalog and select the new Knowledge Graph AMP. The AMP requires GPU access and a Hugging Face API token.
category: Feature
isNew: true
icon: announcement
tags:
- GraphDB
- RAG
- Neo4j
- LLM
- Knowledge-Graph
- GenAI
link: https://github.com/cloudera/CML_AMP_Knowledge_Graph_Backed_RAG
date: '2024-09-18T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Model Registry is Generally Available!
description: |-
The Model Registry serves as a centralized hub for storing, managing, and deploying machine learning models and their associated metadata. This powerful tool streamlines the MLOps process, allowing you to effortlessly develop, deploy, and maintain machine learning models in a production environment.
To get started, go to the Model Registry tab on the left. Please ensure that the Model Registry service is available; if not, work with your administrator to deploy it.
descriptionHtml: |-
The Model Registry serves as a centralized hub for storing, managing, and deploying machine learning models and their associated metadata. This powerful tool streamlines the MLOps process, allowing you to effortlessly develop, deploy, and maintain machine learning models in a production environment.
To get started, go to the Model Registry tab on the left. Please ensure that the Model Registry service is available; if not, work with your administrator to deploy it.
category: Feature
isNew: true
icon: announcement
tags:
- cml
- modelregistry
link: https://community.cloudera.com/t5/What-s-New-Cloudera/Cloudera-s-Model-Registry-is-Now-Generally-Available-GA/ba-p/378002
date: '2023-10-23T00:00:00Z'
targetAudiences: ["PC"]
- title: PromptBrew AMP
description: |-
PromptBrew offers AI-powered assistance in creating high-performing and reliable prompts. Whether you're starting with your project goals or a draft prompt, PromptBrew guides you through a few simple steps to generate and provide new candidate prompts for enhancement. These improved prompts can then be seamlessly integrated into your project and leaderboard.
To give it a try, simply navigate to the AMP Catalog and select the new PromptBrew AMP.
category: Feature
isNew: true
icon: announcement
tags:
- Verta
- OpenAI
- GenAI
- Prompts
- Python
- Prompt Engineering
- LLM
link: "https://github.com/cloudera/CML_AMP_PromptBrew"
date: '2024-09-13T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: AMP - Using Amazon Bedrock for Text Summarization and More
description: |-
Amazon Bedrock is a new AWS Cloud service which allows convenient api access to a number of text and image generation models that can be accessed via an AWS account. In this AMP you can play with instructions and text to see how these different models respond to summarization usecases.
To give it a try, simply navigate to the AMP Catalog and select the new LLM AMP. This AMP does not require GPUs, but does require an AWS account with access to Amazon Bedorck.
descriptionHtml: |-
Amazon Bedrock is a new AWS Cloud service which allows convenient api access to a number of text and image generation models that can be accessed via an AWS account. In this AMP you can play with instructions and text to see how these different models respond to summarization usecases.
To give it a try, simply navigate to the AMP Catalog and select the new LLM AMP. This AMP does not require GPUs, but does require an AWS account with access to Amazon Bedorck.
category: Feature
isNew: true
icon: announcement
tags:
- cml
- llm
- bedrock
- amazon
link: https://community.cloudera.com/t5/Community-Articles/New-Cloudera-AMP-with-Amazon-Bedrock-Integration-Now/ta-p/377071
date: '2023-09-29T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: AMP - Fine Tuning a Foundation Model for Multiple Tasks
description: |-
Fine Tuning a Foundation Model using techniques like Parameter-Efficient Fine-Tuning (PEFT) and Quantization (QLoRA) is demonstrated in the new AMP as a way to enable enterprises to improve performance of LLMs for their specific tasks using their enterprise data. This is all accomplished without enterprises having to upload any of their data to any external service, and having full control over the fine tuned model itself and where it is hosted and served.
To give it a try, simply navigate to the AMP Catalog and select the new LLM AMP. This AMP requires NVIDIA GPUs, if you don't have access to them, work with your administrator to enable them.
descriptionHtml: |-
Fine Tuning LLMs using techniques like Parameter-Efficient Fine-Tuning (PEFT) and Quantization (QLoRA) is demonstrated in the new AMP as a way to enable enterprises to improve performance of LLMs for their specific tasks using their enterprise data. This is all accomplished without enterprises having to upload any of their data to any external service, and having full control over the fine tuned model itself and where it is hosted and served.
To give it a try, simply navigate to the AMP Catalog and select the new LLM AMP. This AMP requires NVIDIA GPUs, if you don't have access to them, work with your administrator to enable them.
category: Feature
isNew: true
icon: announcement
tags:
- cml
- llm
link: https://www.youtube.com/watch?v=ROy4b_-w-Iw
date: '2023-09-15T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: AMP - LLM Chatbot Augmented with Enterprise Data
description: |-
This AMP demonstrates how enterprises can seamlessly integrate their own documentation with an LLM to generate more accurate, factual responses, thus creating their own subject matter expert ChatBot. This is all accomplished without a single external API call, running everything within CML.
To give it a try, simply navigate to the AMP Catalog and select the LLM AMP. If you don't see the AMP, ask your administrator to go to 'Site Administration > AMPs' and refresh the catalog. This AMP requires NVIDIA GPUs, if you don't have access to them, work with your administrator to enable them.
descriptionHtml: |-
This new AMP demonstrates how enterprises can seamlessly integrate their own documentation with an LLM to generate more accurate, factual responses, thus creating their own subject matter expert ChatBot. This is all accomplished without a single external API call, running everything within CML.
To give it a try, simply navigate to the AMP Catalog and select the new LLM AMP. This AMP requires NVIDIA GPUs, if you don't have access to them, work with your administrator to enable them.
category: Feature
isNew: true
icon: announcement
tags:
- cml
- llm
link: https://www.youtube.com/watch?v=WBH9hYDyHKU
date: '2023-05-22T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Simplified Data Ingestion through "Add Data" feature
description: |-
The "Add Data" action on CML’s Data Connections allows users to easily upload data into CDP. This new capability simplifies the process of bringing data to CDP and enables Data Scientists to directly ingest their data without depending on administrators or data engineers.
To get started with this feature, you can simply open the “Data” tab in your CML Project and click on the "Add Data" action on the CDP Data Connection you wish to use, and follow the prompts to upload their data into a CDP Data Store.
category: Feature
isNew: false
icon: announcement
tags:
- cml
- data
- cdv
link: https://community.cloudera.com/t5/What-s-New-Cloudera/Cloudera-Machine-Learning-launches-quot-Add-Data-quot/ba-p/369588
date: '2023-05-02T00:00:00Z'
targetAudiences: ["PC"]
- title: Simplify Data Access with Custom Connection Support in CML
description: |-
The Custom Connection Support enables data scientists to seamlessly connect to external data stores from within CML. This feature helps data scientists discover all of their data independently, without worrying about implementation and connectivity details, unlocking their machine learning use cases from the get-go.
Custom Connections need to be configured by Administrators to make connections to legacy on-prem databases (Oracle, MSSQL, MySQL), serverless cloud databases (Redshift, Snowflake, SAP HANA Cloud, BigQuery), APIs, or specialized data stores (Neo4j).
If you have requirements for external data, you can work with your Administrator to get them enabled.
category: Feature
isNew: false
icon: announcement
tags:
- cml
- data
- cdv
link: https://community.cloudera.com/t5/What-s-New-Cloudera/Simplify-Data-Access-with-Custom-Connection-Support-in-CML/ba-p/369585
date: '2023-05-01T00:00:00Z'
targetAudiences: ["PC"]
- title: Experiments powered by MLflow
description: |-
CML's Experiments feature powered by MLflow enables data scientists to track and visualize experiment results
CML Experiments have been rebuilt, leveraging the MLflow ecosystem to complement CML's existing strengths in model development and deployment. CML now ships the mlflow SDK and an integrated visual experience that enables experiment tracking and comparison via flexible visuals.
category: Feature
icon: announcement
tags:
- experiments
- mlflow
- cml
link: https://docs.cloudera.com/machine-learning/cloud/experiments/topics/ml-experiments-v2.html
date: '2022-10-30T00:00:00Z'
targetAudiences: ["PC"]
- title: PBJ Workbench Runtimes
description: |-
The ML Runtimes release ship the GA version of the workbench architecture, the PBJ (Powered by Jupyter) Workbench. In the previous Workbench editor, a Cloudera-specific custom messaging protocol was used as a communication channel between CML and Runtimes. PBJ Runtimes use Jupyter components, so user code and third-party libraries are more consistent with their behavior in Jupyter-based environments. This enables a wider variety of rich visualization libraries out of the box, brings easier troubleshooting, and fewer dependency conflicts.
For example, Python 3's input() function now works. Go ahead and try it out!
category: Feature
icon: announcement
tags:
- jupyter
- workbench
- runtimes
link: https://docs.cloudera.com/machine-learning/cloud/runtimes/topics/ml-pbj-workbench-requirements.html
date: '2022-10-29T00:00:00Z'
targetAudiences: ["PC"]
- title: Iceberg connection support
description: |-
While businesses adopt and build their open lakehouse built with Apache Iceberg on CDP they need ease of data access so data scientists don’t need to spend their time figuring out connection dependencies and configurations.
CML’s Data Connection and Snippet support simplify data access in CDP. Data scientists can use the cml.data library to gain access to the Data Lake via Spark or query their Virtual Warehouse with Hive or Impala. With recent improvements to the cml.data library, CML Snippets now fully support the Iceberg table format for all Spark, Hive, and Impala data connections.
category: Feature
icon: announcement
tags:
- data-discovery-exploration
- cdp
- cml
link: https://blog.cloudera.com/one-line-away-from-your-data/
date: '2022-09-01T00:00:00Z'
targetAudiences: ["PC"]
- title: Data Discovery & Visualization
description: |-
The Data Discovery and Visualization experience ships with preconfigured Data Connections, a database browser, interactive SQL editor, drag-and-drop Visual Dashboarding, and Connection Snippets.
These capabilities speed up the development process by cutting down the time spent on finding, exploring, understanding, and accessing the data. Data Scientists need to fully understand their data in order to properly analyze it, build models, and power ML use cases. To reduce friction between the different steps of discovery and exploration and support collaboration within the data science team, CML ships all tools to accelerate the data science process and reduce the time to insights.
category: Feature
icon: announcement
tags:
- data-discovery-exploration
- cdp
- cml
- hdp
link: https://docs.cloudera.com/r/data-discovery-visualization
date: '2022-06-01T00:00:00Z'
targetAudiences: ["PC"]
- title: Data Connections and Snippets
description: |-
Cloudera Machine Learning now offers Snippet to connect to Data Sources available within the CDP Environment. Administrators can configure custom Spark, Hive or Impala Virtual Warehouse data connections manually or they can use CML’s features to autodetect and configure all connections from the same CDP Environment. Data Scientists can then access the preconfigured Data Connections from their ML Projects.
The Data Connection and Snippet support simplifies getting started on ML Projects. Once a Project is created, the first time the users create a session they are offered code snippets to create a connection to their selected data store. Users don’t need to look up the connection boilerplate from the documentation or copy an example code from other projects, they can easily initiate the connection via CML’s connection library and immediately start solving their business problems.
category: Feature
icon: announcement
tags:
- data-discovery-exploration
- cdp
- cml
- hdp
link: https://docs.cloudera.com/r/data-connections-snippets
date: '2022-01-01T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Project-level ML Runtime configuration in CML
description: "Project-level configuration for ML Runtimes adds the ability to limit
the available Runtime selection for a particular project. Users, project owners
who would like to have more control over the available ML Runtimes for their project,
or would like to keep up to date with new ML Runtimes, can now specify the preferred
list of images available for a project.\n\nWith the broad set of released ML Runtimes
and available customization options, choosing the right Runtime for a project
might be difficult and project owners might prefer controlling the available options
themselves. This feature available from the user interface now gives a convenient
way to limit the available options by adding runtimes based on the supported Editor,
Kernel, Edition and Version. It also provides a way to indicate and include newly
released versions of the configured runtimes via a simple mouse click. \n\nTo
get started, try creating a new project, where now Basic and Advanced options
are available for selecting and pre-populating the project with the selected ML
Runtimes. \n\nExisting projects can keep on using any Runtime that’s available
in the Workspace until the list is not specified under the project settings."
category: Feature
icon: process
tags:
- runtimes
- cdp
- cml
- hdp
link: https://docs.cloudera.com/r/project-level-runtime-configurations
date: '2022-01-01T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Cloudera Machine Learning APIv2
description: |-
Cloudera Machine Learning’s APIv2 enables automated project lifecycle management, CI/CD integration, and more. APIv2 provides CML users with the ability to programmatically create, read, update and delete projects and workloads, including jobs, models and applications. This means that users can automate creation and setup of projects, or trigger actions such as retraining or deploying a new version of a model as part of the project lifecycle. All this is enabled from within the product or from an external scheduling or CI/CD tool, using the Python client library or HTTPS REST API.
In addition to creating production-ready ML models and applications, machine learning engineers must also take those models and applications to production, which often involves deploying to a different environment. Further, these applications are rarely deployed once and left alone. Rather, many models will be updated with further versions as modellers continue to iterate, or as the profile of incoming data changes. These activities are challenging and susceptible to error if they can only be done manually through a UI. CML’s API now allows ML engineers to script these deployment and maintenance events, schedule or automate them, or integrate with external process and approval workflows by calling APIs based on a suitable trigger.
category: Release
icon: release
tags:
- api-v2
- announcement
- cml
link: https://docs.cloudera.com/machine-learning/cloud/api/topics/ml-api-v2.html
date: '2021-09-27T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Apache Spark 3 is now available in CML
description: |-
Cloudera Machine Learning now offers multi-version Spark support. Users of CML can select the Spark version they want to use for each workload. With multi-version Spark support, users now have access to Spark 3 and can take advantage of the 30% performance and stability improvements in the latest version of Spark (based on internal TPC-DS benchmarks).
Data Scientists can run workloads in both Spark 2 and Spark 3 within the same CML Workspace, thus maintaining backwards compatibility with existing workloads while developing new applications on the latest version of Spark. Spark can be configured as a Runtime Addon per workload so users can migrate and test all of their scheduled jobs one-by-one within a single Project.
category: Feature
icon: feature
tags:
- spark
- cml
- ml
link: https://community.cloudera.com/t5/What-s-New-Cloudera/Apache-Spark-3-is-now-available-in-Cloudera-Machine-Learning/ba-p/331584
date: '2021-09-21T00:00:00Z'
targetAudiences: ["PC", "PVC"]
- title: Applied ML Prototypes (AMPs)
description: |-
Applied ML Prototypes (AMPs) provide reference example machine learning projects in Cloudera Machine Learning. More than simplified quickstarts or tutorials, AMPs are fully-developed expert solutions created by Cloudera’s research arm, Fast Forward Labs.
These solutions to common problems in the machine learning field demonstrate how to fully use the power of Cloudera Machine Learning. AMPs show you how to create CML projects to solve your own use cases.
AMPs are available to install and run from the CML user interface. As new AMPs are developed, they will become available to you for your study and use.
category: Feature
icon: feature
tags:
- ffl
- amps
- cml
- ml
link: https://cloudera.github.io/Applied-ML-Prototypes/#/
imgpath: https://cloudera.github.io/Applied-ML-Prototypes/images/hero.jpg
date: '2021-02-03T00:00:00Z'
targetAudiences: ["PC", "PVC"]