Using Python, NoSQL (mongoDB
& elasticsearch
) and R for pipeline structure
This project is an idea put at work on pipe-lining individual tools/languages and skills for extracting the maximum capabilities of each of them. The languages used are Python
, NoSQl
, and R
programming.
As we know Python is primarily a scripting language that is highly proficient with data extraction, manipulation and web development. Here we use it for capturing structured and unstructured data from the web and aligning it to a specific format that is easy to work with. This structured data is then forwarded to the NoSQL database connection with mongoDB for data management and updating as and when required. This data is communicated to elasticsearch which provides the defualt indexing to help searching the data faster. This data then further follows with R programming, by connecting the database from the other end to create a virtual pipeline flow. The data thus imported into R, would be used for performing explanatory data analysis and predictions, ranging from both descriptive data analysis to prescriptive analysis with focus on obtaining the summary statistics from multiple variables (principal variables) and applying the efficient and apt models after testing for accuracy.
Pipelining BigData tools for the future ready & better Data Sciences
Predicting per game win-loss-draw with some accuracy for home as well as away games, in head-to-head fixtures
- The teams data selected for this purpose is restricted to Laliga, which is the Spanish competition between 20 Division-I teams.
Datasource- Spanish_Team_Data