This is a study of BigData technologies in a structured way. Its aimed at being a structured reference material for people starting out in studing open source big data projects. The information here would be available everywhere else but its collated in such a way that it can be used for comparision and getting a whole picture of the open source project at a glance.
Okay, so I asked myself. What is the information if I knew about an Open Source big data project would be helpful in 'getting' what the project is about and I put down a bunch of questions. The intention on this repo is to answer these questions for each project that we that we cover.
- Project Name
- Version
- License
- Released on
- Organization that made it
- Backed by known companies / individuals
- Size of the community
- Is the product in active development? (How many issues closed recently, How many merge requests closed recently, How many commits happened recently)
- Is this project safe to put in production?
- Are there major known production deployments of this project?
- Know big deployments and numbers
- What is the use, reason of existence.
- Plays well with what other relevant tools
- Plays well with what proprietary tools
- What languages are supported, which one has first class support
- Competitors in the same space
- Paid vs OS comparision if paid version exists
- Is this part of any ecosystem
- What is too big for tool scenario -- When would you definitely not use this tool
- What are good data loads that this product handles
- What is overkill scenario -- When this tool is just too big for the problem and shouldn't be used
- Comparitive Benchmarks
- Whats good about this?
- Reason for concern?
- Vulnerabilities
- Review articles
- Praise articles
- Complain articles
- Other projects by the creators
- How to install
- Hello world
- [] Cassandra