Statistics is the branch of science that deals with collecting, organizing, analyzing, interpreting, and presenting data. The field of statistics basically divided into two parts such as;
- Descriptive Statistics: It deals with collecting, analyzing, and summarizing the data.
- Inferential Statistics: It is a technique that makes the conclusions about the whole data (population) by observing a small amount of data (sample).
Below, I tried to summarize first few Chapters of this book with the help of Jupyter notebook files using various datasets obtained from Kaggle and other data sources.
This chapter focuses on the first step in any data science project: exploring the data. Exploratory data analysis, or EDA, is a comparatively new area of statistics. In 1962, John W. Tukey called for a reformation of statistics in his seminal paper “The Future of Data Analysis” [Tukey-1962]. With the ready availability of computing power and expressive data analysis software, exploratory data analysis has evolved well beyond its original scope. Key drivers of this discipline have been the rapid development of new technology, access to more and bigger data, and the greater use of quantitative analysis in a variety of disciplines.
The concepts we will discuss in this chapter is data and sampling distributions. Traditional statistics very much focused on using theory based on strong assumptions about the population. Modern statistics has moved to the sampling procedures, where such assumptions are not needed. In general, data scientists need not worry about the theoretical nature of population and instead should focus on the sampling procedures and the data at hand. There are some notable exceptions. Sometimes data is generated from a physical process that can be modeled. The simplest example is flipping a coin: this follows a binomial distribution. Any real-life binomial situation (buy or don’t buy, fraud or no fraud, click or don’t click) can be modeled effectively by a coin (with the modified probability of landing heads, of course). In these cases, we can gain additional insight by using our understanding of the population.
Design of experiments is a cornerstone of the practice of statistics, with applications in virtually all areas of research. The goal is to design an experiment in order to confirm or reject a hypothesis. Data scientists often need to conduct continual experiments, particularly regarding user interface and product marketing. This chapter reviews traditional experimental design and discusses some common challenges in data science. It also covers some oft-cited concepts in statistical inference and explains their meaning and relevance (or lack of relevance) to data science.
- Visual Studio Code (VS Code) [https://code.visualstudio.com/download]
- Python (Version 3.8.5) [https://www.python.org/downloads/]
- Jupyter Notebook [Notebook availabe in VS code as a part of extension]
- Shashank Kalanithi: https://www.youtube.com/watch?v=wwsizzg6UjU&list=PL-u09-6gP5ZNd6AhULnQHr6ZsF15qy4D0
- Krish Naik: https://www.youtube.com/watch?v=y1y1ATTMpaw
- Derek Banas: https://youtu.be/tcusIOfI_GM
- Khan Academy: https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL1328115D3D8A2566
- Code Basics: https://www.youtube.com/watch?v=8ZI55Inh1_A&list=PLeo1K3hjS3uuKaU2nBDwr6zrSOTzNCs0l
- Code with Harry: https://www.youtube.com/watch?v=gfDE2a7MKjA