Is an approach for Analytics projects of any kind. Is is deisgned to guide teams on any experience level through Big Data projects.
DIPS stands for Data Insights Project Stencil. It includes MISS-Card, which stands for Management, Inputs, Solution, & Scoring Card. Both of techniques support the successful completion of analytics projects. It is designed to cover
- ”classic" reporting and BI projects
- as well as Advanced Analytics
- Exploratory Data Analysis (EDA) and
- Machine Learning (ML) projects.
DIPS was designed to unify the established world of reports with the upcoming needs in Big Data projects. DIPS is inspired by well-established techniques from Data Mining to grow up an approach to handle Data Science projects of any kind and complexity.
DIPS = {MISS Card
, DIPS Overview
, DIPS Timeline
}
Business challenge → Project String → Project Overview with Phases → Project Timeline → specific technical tasks
MISS Card is subject to any DIPS project, regardless of Project String.
-
To use DIPS in a project environment, the C-level executives fill out with MISS Card. This determines the
Project String
, so what to do, only. -
Based on this Project String, the task at hand, the upcoming project lead supports the C-levels filling out the MISS Card. This card will be the basis for usually upcoming questions regarding the project. Filling this out in advance makes sure in the operational execution of the project there is very little input needed by the Top Management because every key aspect of the project will be handed out right on the first day.
-
In the operational execution this MISS Card together with the DIPS Project String is the first go-to address for any questions, that may come up. The DIPS Project String defines the order of execution for novices and a checklist for experienced Data Scientists. The MISS Card contains specific information regarding this exact project, like data sources, the destination for data/models, due date, and what platform to use.
DIPS Overview basically is the three Project Strings — one String for each kind of data-related project.
There are three Project Strings: Report
, Exploratory Data Analysis
, and Machine Learning
. Each of the
Strings serves a clearly defined purpose. The 'right' String depends on the business challenge or question alone.
Any string has exactly four Phases,
where the first is always Initialize
, second comes Transform
, third is always dependent on the Project String,
and forth is always Completion
. The third Phase is always the same name, the Project String has.
String Report
String Exploratory Data Analysis
String Machine Learning
The first Phase is always the same, even from the ordered tasks to do. The following three Phases are distinct from one
another, whereas there maybe parallels in Exploratory Data Analysis
and Machine Learning
, which stems from
their nature.
MISS Card stands for Management, Inputs, Solution, & Scoring Card. The management and project lead define some key aspects here, like data sources, the destination for data/models, due date, whom to report to, and what platform to use.
This card is designed to contain specific aspects for this exact project. Team members and developers use it whenever a question comes up first. As the most common questions are covered here, chances are the project has fewer delay as the time from question to answer can be reduced.
From top to bottom:
- Management
This card's section is divided into three columns: one for each Project String. Your current Project String defines the column to fill and so the values to gather. There will usually be two columns which you may ignore. Only, iff you are doing a project with broad scope or you are doing a follow-up project, there maybe more than one column of interest to you. Mostly, exactly one column is what you need.
- Inputs
Here the project lead, Data Scientists and the business stakeholders define what data to use. Those maybe pre-processed, value-added, annotated, anonymized, etc. to meet law, business, and technological requirements. This card's section defines the data source and destination for data & models from this endeavor.
- Solution & Scoring
This card's section is devided into two for one reason: you may decide for a technological solution first. This is appropriate when, maybe you platform is defined to be Cassandra or there is only an Enterprise Data Hub, Hadoop. Then you may start out filling in the fixed values and end up with a resulting Scoring for runtime performance, scalability and quality of visualizations.
The other way is to start out with the latter three: you may define your Scoring, like high runtime performance, high scalability and best quality of visualizations. Then you end up deriving therefrom the technological platform, developers, and Frameworks to use.
So these Solution & Scoring parts are strongly bonded to each other. Defining one fixes the other. So here it is recommended to define either what is important to your project or what you already know. The other column's values will be a result of the first filled column's values then.
The Timeline now is defined by the Project String only. You work your way though it first from left to right and then top to bottom. This defines the order of your tasks, that is in general the most appropriate. Please keep in mind, that specific constraints in your project and arrangements in your company may override the general order. Please feel free to re-arrange, add, and remove accordingly to your specific needs.
DIPS Timeline gives an ordered guide line what tasks to do. Together with your MISS Card a developer can work almost autonomously and report the state of their work to the project lead efficiently. As there are always four Phases a rough estimate of the allover project state for management reports is possible to derive, as well.
Report Timeline
EDA Timeline
Machine Learning Timeline
Why is it, that in all three Phases either
Initialization
Phases end with finding appropriate data sources andTransform
Phases start with the same thing?
The answer is pretty simple: Initialization
Phase is done by the project leader and developers effectively read the
Codebook. Transform
Phase is meant for the developers to work on and the project leader just monitoring from here
on. As finding an appropriate data source spares lots of time, both project leader and developers are meant to watch
out for them. This is little effort with much return iff successful. So this is redundant, as in projects sometimes no
practical data governance is established. So either one maybe more aware of pre-processed data, therefore.
-
Can I use DIPS to monitor Key Performance Indicators (KPI)?
- examples are
- what are my top-selling products?,
- how much profit does each department achieve?,
- what is the average load of my warehouse?
- Yes, you can.
Report
Project String is the right choice for that. This is classical BI questions. Iff there are approaches fitting your IT Architecture or data better, consider using them when DIPS is too general for that. Otherwise DIPS gives you a guide line for that either. The reason BI questions maybe more appropriate to solve without DIPS is, that the reason for DIPS was the lack of clear approaches to do EDA and ML projects. As BI questions are common projects, there maybe something fitting your needs better than DIPS.
- examples are
-
Can I apply DIPS when there is no specific problem I can see, no KPIs to derive, etc.?
- You want to become Data-driven and pro-actively resolve challenges before they grow up to problems. On the one hand there is no need to act, but you want to gain a deeper understanding of your business and be sure potential risks are managed right when they come up.
- Yes, you can. The
EDA
Project String was designed for just that. A Data Scientist will apply various models, trying to isolate specific instances. There maybe pattern, that you need to be aware of. This could be anything from a virus in your computer network, mechanical parts with low quality ending up to enlarge the operational risk and maintenance costs, or methods intended to increase your productivity working in the opposite direction.
-
I know the problem, but there is no straight-forward answer. Is DIPS applicable?
- There is a ver specific problem, but no easy answer. You have the feeling that a Data Scientist at least could track the problem and maybe limit or isolate it.
- You can use DIPS therefore. The
Machine Learning
String was made just for this. A Data Scientist will look for patterns, like in theEDA
String, maybe do anEDA
beforehand, and may come up with a magic solution.
Order no. | Stakeholders | Step |
---|---|---|
1 | Strategic Management | Determine business need and derive project kind from DIPS overview:
|
2 | Strategic Management, Project lead | Management & project lead fill out MISS Card to determine key points of project |
3 | Project lead & members |
|
The progress of a project is measured by the current Phase
it is in. As in all 3 Project Strings
there are
exactly 4 Phases
, progress is quiet comparable. As a rough measure the workload distribution is about 1:2:3:1
for Phases Initialize
, Transform
, the Project String specific Phase, and Completion
.
Project String | Workload in percent | Dominant roles |
---|---|---|
Initialize | 15% | Project leader |
Transform | 30% | Developers |
Project String specific Phase
|
40% | Developers |
Completion | 15% | Project leader |
The first and the last Phase is were the project leader has most of the work to do. In the second and third Phase the Developers do and the project leader is mostly monitoring their progress.
Abbreviation | Long form | Public Abbr. |
---|---|---|
BI | Business Intelligence | Public |
DIPS | Data Insights Project Stencil | |
DSPS | Data Science Project Stencil | |
EDA | Exploratory Data Analysis | Public |
KPI | Key Performance Indicator | Public |
MISS | Management, Inputs, Solution, & Scoring | |
ML | Machine Learning | Public |
DIPS and MISS have come a long way: since it's invention, it has been reviewed, optimized, and applied in several projects. With this publication we like you to test it in your company. You may modify this stencil to your needs as well.
For any feedback to us and hints what to improve we are more than thankful.
It was invented because there was no project approach for Advanced Analytics projects. DIPS was invented independently using experience and knowledge from colleagues. Later, it turned out there are natural parallels to Crisp DM. A comparison and the relation of the two are the subject of the upcoming section.
DIPS can be seen as similar to
Crisp DM. DIPS is easier to follow for
inexperienced project members as it is closer to a step-by-step guide than Crisp DM. DIPS therefore also covers
specific Project Strings
, Report
, Exploratory Data Analysis
, and Machine Learning
, to fulfil the
expectations of being a helpful guide for projects. While Crisp DM has the same layout and projects were to evolve the
same in either model, MISS Card
and DIPS Timeline
make it easy to begin a Data Science project with fewer
experience because to process is thought-through. So team members can concentrate on techniques and Statistics.
DIPS Overview
for Report
, String
is what is closest to Crisp DM. So mathematically expressed
Crisp DM ⊂ DIPS. Crisp DM is very suitable for experienced teams or having a rough overview, e.g. for management
reports.
DIPS (Data Insights Project Stencil) is also known as DSPS (Data Science Project Stencil). MISS Card (Management, Inputs, Solution, & Scoring) is a means to collect key information in one place.
Icons on graphics are taken as is from the Twitter Bootstrap icon package. A big thank you to the guys at Twitter.
Developed at Belvine University. Applied in projects at msg systems ag and Belvine Consulting.
© 2010-2015, Daniel Schulz