by William Ayd and Matthew Harrison
Practical recipes for scientific computing, time series, and exploratory data analysis using Python
This is the code repository for Pandas Cookbook, Third Edition, published by Packt.
The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter.
With this latest edition unlock the full potential of pandas 2.x onwards. Whether you're a beginner or an experienced data analyst, this book offers a wealth of practical recipes to help you excel in your data analysis projects. This cookbook covers everything from fundamental data manipulation tasks to advanced techniques for handling big data, visualization, and more. Each recipe is designed to address common real-world challenges, providing clear explanations and step-by-step instructions to guide you through the process.
Explore cutting-edge topics such as idiomatic pandas coding, efficient handling of large datasets, and advanced data visualization techniques. Whether you're looking to sharpen or expand your skills, the Pandas Cookbook is your essential companion for mastering data analysis and manipulation with pandas 2.x, and beyond.
- The pandas type system and how to best navigate it
- Import/export DataFrames to/from common data formats
- Data exploration in pandas through dozens of practice problems
- Grouping, aggregation, transformation, reshaping, and filtering data
- Merge data from different sources through pandas SQL-like operations
- Leverage the robust pandas time series functionality in advanced analyses
- Scale pandas operations to get the most out of your system
- The large ecosystem that pandas can coordinate with and supplement
- pandas Foundations
- Selection and Assignment
- Data Types
- The pandas I/O System
- Algorithms and How to Apply Them
- Visualization
- Reshaping DataFrames
- Group By
- Temporal Data Types and Algorithms
- General Usage/Performance Tips
- The pandas Ecosystem
The code in this book will make use of the pandas, NumPy, and PyArrow libraries. Jupyter Notebook files are also a popular way to visualize and inspect code. All of these libraries should be installable via pip
or the package manager of your choice. For pip
users, you can run:
python -m pip install pandas numpy pyarrow notebook
The suggested method to work through the content of this book is to have a Jupyter notebook up and running so that you can run the code while reading through the recipes. Following along on your computer allows you to go off exploring on your own and gain a deeper understanding than by just reading the book alone.
After installing Jupyter notebook, open a Command Prompt (type cmd
at the search bar on Windows, or open Terminal on Mac or Linux) and type:
jupyter notebook
If you see anything that doesn't run as expected, raise an issue, and we'll work on it!
You can create an issue , if you encounter any in the notebooks, we will be glad to provide you support.
If you feel this book is for you, get your copy today!
Join our community's Discord space to ask questions, provide solutions to other readers, discussions with the authors and much more.
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click here to claim your Free PDF.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book at ColorImages.
William Ayd is a core maintainer of the pandas project, serving in that role since 2018. For over a decade working as a consultant, Will has helped countless clients get the most value from their data using pandas and the open-source ecosystem surrounding it.
Matthew Harrison has been using Python since 2000. He runs MetaSnake, which provides corporate training for Python and data science. He is the author of Machine Learning Pocket Reference, the bestselling Illustrated Guide to Python 3, and Learning the Pandas Library, among other books.