Skip to content

AndreiPetrukhin/de-project-sprint-6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview

This project focuses on developing a Data Vault storage solution for data retrieved from Amazon S3, leveraging a robust technology stack to manage and analyze data effectively.

Technology Stack

  • Data Vault: The methodology applied for creating a scalable and flexible data warehouse using the Data Vault architecture.
  • Apache Airflow: Used for orchestrating and automating the workflows for data extraction, loading, and transformation.
  • Vertica DB: Serves as the high-performance SQL database for storing and querying large datasets, utilizing its columnar storage capabilities.
  • S3 API: Utilized for accessing and managing data stored in Amazon S3 buckets.
  • Docker: Employed to containerize and isolate the application environment, ensuring consistency across different development and production settings.

Main Task

The primary objective of this project is to develop a Data Vault-based storage system. This involves:

  • Designing and implementing a scalable Data Vault model within Vertica DB.
  • Extracting data from Amazon S3 using S3 API integrations.
  • Automating data flows and ETL processes with Apache Airflow to populate the Data Vault.
  • Ensuring data consistency and integrity throughout the data lifecycle.
  • Configuring Docker containers to manage the application and its dependencies efficiently.

Goals

  • Efficiency: Optimize data retrieval and storage processes to handle large volumes of data efficiently.
  • Scalability: Design the system to scale seamlessly with increasing data sizes and complexity.
  • Reliability: Ensure high availability and reliability of the data storage and retrieval processes.

This project aims to harness the strengths of each component in the technology stack to create a robust data warehousing solution that supports complex data analysis and business intelligence activities.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages