Practical Statistics for Data Scientists

Introduction

Statistics is the branch of science that deals with collecting, organizing, analyzing, interpreting, and presenting data. The field of statistics basically divided into two parts such as;

Descriptive Statistics: It deals with collecting, analyzing, and summarizing the data.
Inferential Statistics: It is a technique that makes the conclusions about the whole data (population) by observing a small amount of data (sample).

Below, I tried to summarize first few Chapters of this book with the help of Jupyter notebook files using various datasets obtained from Kaggle and other data sources.

Chapter 1. Exploratory Data Analysis

This chapter focuses on the first step in any data science project: exploring the data. Exploratory data analysis, or EDA, is a comparatively new area of statistics. In 1962, John W. Tukey called for a reformation of statistics in his seminal paper “The Future of Data Analysis” [Tukey-1962]. With the ready availability of computing power and expressive data analysis software, exploratory data analysis has evolved well beyond its original scope. Key drivers of this discipline have been the rapid development of new technology, access to more and bigger data, and the greater use of quantitative analysis in a variety of disciplines.

Chapter 2. Data and Sampling Distributions

The concepts we will discuss in this chapter is data and sampling distributions. Traditional statistics very much focused on using theory based on strong assumptions about the population. Modern statistics has moved to the sampling procedures, where such assumptions are not needed. In general, data scientists need not worry about the theoretical nature of population and instead should focus on the sampling procedures and the data at hand. There are some notable exceptions. Sometimes data is generated from a physical process that can be modeled. The simplest example is flipping a coin: this follows a binomial distribution. Any real-life binomial situation (buy or don’t buy, fraud or no fraud, click or don’t click) can be modeled effectively by a coin (with the modified probability of landing heads, of course). In these cases, we can gain additional insight by using our understanding of the population.

Chapter 3. Statistical Experiments Significance Testing

Design of experiments is a cornerstone of the practice of statistics, with applications in virtually all areas of research. The goal is to design an experiment in order to confirm or reject a hypothesis. Data scientists often need to conduct continual experiments, particularly regarding user interface and product marketing. This chapter reviews traditional experimental design and discusses some common challenges in data science. It also covers some oft-cited concepts in statistical inference and explains their meaning and relevance (or lack of relevance) to data science.

Sources Used For Coding

Visual Studio Code (VS Code) [https://code.visualstudio.com/download]
Python (Version 3.8.5) [https://www.python.org/downloads/]
Jupyter Notebook [Notebook availabe in VS code as a part of extension]

References

Shashank Kalanithi: https://www.youtube.com/watch?v=wwsizzg6UjU&list=PL-u09-6gP5ZNd6AhULnQHr6ZsF15qy4D0
Krish Naik: https://www.youtube.com/watch?v=y1y1ATTMpaw
Derek Banas: https://youtu.be/tcusIOfI_GM
Khan Academy: https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL1328115D3D8A2566
Code Basics: https://www.youtube.com/watch?v=8ZI55Inh1_A&list=PLeo1K3hjS3uuKaU2nBDwr6zrSOTzNCs0l
Code with Harry: https://www.youtube.com/watch?v=gfDE2a7MKjA

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Books		Books
Chapter_1_Exploratory_Data_Analysis		Chapter_1_Exploratory_Data_Analysis
Chapter_2_Data_and_Sampling_Distributions		Chapter_2_Data_and_Sampling_Distributions
Chapter_3_Statistical_Experiments_Significance_Testing		Chapter_3_Statistical_Experiments_Significance_Testing
Practical Statistics for Data Scientists 50 Essential Concepts.pdf		Practical Statistics for Data Scientists 50 Essential Concepts.pdf
README.md		README.md
Statistics for Data Science Syllabus.pdf		Statistics for Data Science Syllabus.pdf
t-table.pdf		t-table.pdf
z-table.pdf		z-table.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical Statistics for Data Scientists

Introduction

Chapter 1. Exploratory Data Analysis

Chapter 2. Data and Sampling Distributions

Chapter 3. Statistical Experiments Significance Testing

Sources Used For Coding

References

About

Releases

Packages

Languages

Krishnkumar542/Practical_Statistics_for_Data_Scientists

Folders and files

Latest commit

History

Repository files navigation

Practical Statistics for Data Scientists

Introduction

Chapter 1. Exploratory Data Analysis

Chapter 2. Data and Sampling Distributions

Chapter 3. Statistical Experiments Significance Testing

Sources Used For Coding

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages