- DCPAM goal is to deliver full range of Data Warehouse possibilities without need to include or hire more engineers for this task.
- DCPAM architecture is highly flexible and provides unlimited scaling possibilities.
- DCPAM is multiplatform solution: on-premise (Linux/Windows), cloud (Private, Microsoft Azure, Amazon Web Services, Google Cloud Platform) and hybrid deployments are possible.
- DCPAM is hardware- and database-agnostic with component-based architecture.
-
Open source, cost-effective.
-
DCPAM helps to create single central repository of integrated company data - this provides a single integrated view of an organisation.
-
All informations are always up to date - Managers can respond rapidly to ongoing changes in the business environment to make data-driven decisions.
-
Data structures are designed in a uniform way - much less effort is needed to prepare and access requested informations.
-
No wide range of advanced technical knowledge is needed to make DCPAM work - installation is straightforward and system configuration can be handled by any person with SQL background and analytic insight of company data.
-
Process gigabytes of data within minutes - benefit of parallel execution.
-
Flexible deployment - use your own infrastructure or get the benefits from cloud platforms (Private, Microsoft Azure, Amazon Web Services, Google Cloud Platform). DCPAM Data Warehouse Solution is hardware- and database-agnostic with component-based architecture.
DCPAM is responsible for copying data from one or more sources into a destination system. That process consists of three steps: Extraction, Transformation and Load, with Transform and Load in various configurations: Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT).
- For detailed information about ETL and ELT pipelines, please refer to DCPAM ETL documentation.
- More specific description of data transformation possibilities can be found in DCPAM RDP documentation.
Data in the Warehouse can be accessed directly at the database level with any system for analytics, such as Power BI, Tableau, Redash and others. For complex architectures with more than one DCPAM Database node, DCPAM Warehouse Data Server is dedicated data access point. For more details, please refer to DCPAM WDS documentation.
All of the DCPAM Data Warehouse operations are monitored by DCPAM Live Component State. See the documentation to read more on this subject.
As DCPAM is extremely modular and highly scalabe, it can serve both as Data Warehouse and number of dedicated Data Marts.
- JSON-based source systems configuration.
- SQL and JSON-based configuration of the ETL processes:
- Data extraction:
- Inserted data
- Deleted data
- Modified data
- Staging Area:
- optional
- placed locally in DCPAM Database
- placed in external database
- Data transformation:
- optional
- handled locally (in relation to the Staging Area)
- handled remotely (in relation to the Staging Area)
- Data load from:
- local Staging Area
- remote Staging Area
- results of the Extract subprocess
- Data extraction:
- Parallel execution:
- By design:
- Each ETL process runs in separate thread.
- By running multiple instances of DCPAM:
- On the same server
- On many disparate servers
- By design:
- SQL and JSON-based preconfigured queries for data analysis.
- Data Warehouse and Data Marts:
- One or many instances of DCPAM can work as Data Warehouse (extracting and processing data from disparate sources).
- In the same time different DCPAM instances can use Data Warehouse to feed Data Marts with specific business-oriented data.
- Data sources:
- Data Warehouse monitoring:
- DCPAM ETL
- DCPAM RDP
- DCPAM WDS
- DCPAM LCS
- Access DCPAM Database with any system for analytics (Power BI, Tableau, Redash etc.).
- DCPAM Admin web application:
- Manage data sources.
- Configure ETL processes.
- Manage DCPAM BI users.
- DCPAM Monitoring (web application for DCPAM LCS).
- DCPAM LCS Notifications
- Data sources:
- LDAP
- Cassandra
- Choose Data Warehouse DBMS, sufficient hardware and disk space.
- Consider Data Warehouse tables schema:
- Project data structures:
- Staging Area
- Target tables
- Indexes
- Views
- ...and much more.
DCPAM Data Warehouse Solution is hardware- and database-agnostic with component-based architecture.
DCPAM Data Warehouse Solution consists of a number of integrated components. Each component has unique log file, and DCPAM ETL - due to parallel execution - can create many log files.
DCPAM ETL is the main ETL/ELT engine. Each running instance can handle many ETL/ELT processes simultaneously. Moreover, many instances of DCPAM ETL can work within single Data Warehouse.
DCPAM WDS is dedicated Massively Parallel Processing service for querying predefined and custom business data with in-memory caching system. DCPAM WDS cluster encapsulates every database into a single data access point. This component can be used without entire DCPAM Data Warehouse Solution and operate as a stand-alone solution.
DCPAM RDP is used by DCPAM ETL to execute transform scripts/applications that must be run on separate machines when performance impact is significant. DCPAM ETL communicates with these remote scripts/applications through DCPAM RDP.
DCPAM workflow overview
Example of DCPAM Extract-Transform-Load / Extract-Load-Transform process scalability:
- Multiple DCPAM ETL engine nodes can be run within single Data Warehouse.
- Each DCPAM ETL instance can trigger unlimitend number of local and remote (through DCPAM RDP) data transformation scripts/applications.
- Each DCPAM ETL instance can use dedicated Staging Area node (local or remote).
- Many DCPAM Database nodes can be encapsulated into single data access point by DCPAM WDS.
- DCPAM ETL, Staging Area, DCPAM Database and DCPAM WDS can run on single server as well!
Following diagrams represents different configurations to deploy both Data Warehouse and Data Marts.
Data Warehouse with DCPAM
Data Marts with DCPAM Data Warehouse
DCPAM LCS is the central repository of informations about all the DCPAM Components current state. It is the most important DCPAM Component from an administrative point of view.
DCPAM Construct is going to be the main system administration web application.
DCPAM Monitoring is going to be web application for DCPAM LCS data presentation.
DCPAM development is still in progress with following data sources available:
Data source | Support |
---|---|
PostgreSQL | native |
MySQL 8 | native |
MariaDB/MySQL 5 | native |
SQL Server/Azure SQL Database | native via ODBC* |
Oracle Database | native |
SQLite3 | native |
IBM Db2 | ODBC |
Microsoft Access | ODBC |
MongoDB | native |
* SQL Server/Azure SQL Database: ODBC is the primary native data access API for applications written in C and C++ for SQL Server.
Please note that DCPAM provides support for every ODBC-compliant data source.
Data source | Support |
---|---|
CSV | native |
TSV | native |
PSV | native |
JSON | native |
XLS | script |
XLSX | script |
XLSM | script |
XLSB | script |
ODF | script |
ODS | script |
XML | script |
DCPAM can access files and other resources from local or remote locations. The latter are fetched via HTTP/HTTPS protocol and Battery HTTP Server is recommended. Parameters such as URL, method, headers and additional payload are supported. Files are loaded to temporary tables in DCPAM or external database to make SQL operations possible for this kind of data.
DCPAM is designed to be as most customizable as it needs to be. Therefore every database listed above as available data source can also be used as DCPAM target database.
Each DCPAM Component is provided with Dockerfile
to build Docker image:
- DCPAM ETL:
dcpam-etl.dockerfile
- DCPAM RDP:
dcpam-rdp.dockerfile
- DCPAM WDS:
dcpam-wds.dockerfile
- DCPAM LCS:
dcpam-lcs.dockerfile
- DCPAM WDS: cache TTL.
- DCPAM ETL: LDAP support.
Under development.
Year | Quarter | Feature | Status |
---|---|---|---|
2020 | Q3 | ✅ Transform process | Done (2020-08-07) |
2020 | Q3 | ✅ DCPAM WDS | Done (2020-08-17) |
2020 | Q3 | ✅ DCPAM auth keys | Done (2020-08-19) |
2020 | Q3 | ✅ Data source: CSV/TSV/PSV | Done (2020-08-30) |
2020 | Q3 | ✅ Remote files | Done (2020-09-01) |
2020 | Q3 | ✅ Data source: JSON | Done (2020-09-03) |
2020 | Q3 | ✅ DCPAM LCS | Done (2020-09-15) |
2020 | Q3 | ✅ Docker images for all the DCPAM Components | Done (2020-09-20) |
2020 | Q3 | ✅ HTTPS support | Done (2020-09-22) |
2020 | Q3 | ✅ Data source: externally preprocessed | Done (2020-09-24) |
2020 | Q3 | ✅ Data source: XML | Done (2020-09-24) |
2020 | Q3 | ✅ DCPAM ETL connectivity tests | Done (2020-09-25) |
2021 | Q1 | ✅ Data source: APIs | Done (2021-02-01) |
2021 | Q1 | ✅ DCPAM WDS true Multi Parallel Processing | Done (2021-02-02) |
2021 | Q1 | ✅ DCPAM WDS: horizontal scaling | Done (2021-02-11) |
2021 | Q1 | ✅ DCPAM WDS: Advanced Query Cache | Done (2021-02-14) |
2021 | Q1 | ✅ Data source: MongoDB | Done (2021-02-27) |
2021 | Q1 | Data source: LDAP | In progress |
2021 | Q1 | DCPAM WDS: cache TTL | In progress |
2021 | Q1 | DCPAM ETL: report workflow failures to DCPAM LCS | Not implemented |
2021 | Q1 | Read complex SQL queries from files | Not implemented |
2021 | Q2 | DCPAM Monitoring | Not implemented |
2021 | Q4 | DCPAM Construct (admin web application) | Not implemented |
2022 | Q1 | DCPAM AI Platform | Not implemented |
This software uses: