Hands-on Data Analysis and Visualization with Pandas , livre ebook

263

pages

English

Ebooks

2020

Écrit par
RAO KATHULA PURNA CHANDER

Publié par
BPB Publications

Vous pourrez modifier la taille du texte de cet ouvrage

Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

263

pages

English

Ebook

2020

Vous pourrez modifier la taille du texte de cet ouvrage

Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Publié par

BPB Publications

Date de parution

03 septembre 2020

Nombre de lectures

EAN13

9789389845655

Langue

English

Learn how to use JupyterLab, Numpy, pandas, Scipy, Matplotlib, and Seaborn for Data science Key Features a- Get familiar with different inbuilt Data structures, Functional programming, and Datetime objects. a- Handling heavy Datasets to optimize the data types for memory management, reading files in chunks, dask, and modin pandas. a- Time-series analysis to find trends, seasonality, and cyclic components. Seaborn to build aesthetic plots with high-level interfaces and customized themes. a- Exploratory data analysis with real-time datasets to maximize the insights about data. Description The book will start with quick introductions to Python and its ecosystem libraries for data science such as JupyterLab, Numpy, Pandas, SciPy, Matplotlib, and Seaborn. This book will help in learning python data structures and essential concepts such as Functions, Lambdas, List comprehensions, Datetime objects, etc. required for data engineering. It also covers an in-depth understanding of Python data science packages where JupyterLab used as an IDE for writing, documenting, and executing the python code, Numpy used for computation of numerical operations, Pandas for cleaning and reorganizing the data, handling large datasets and merging the dataframes to get meaningful insights. You will go through the statistics to understand the relation between the variables using SciPy and building visualization charts using Matplotllib and Seaborn libraries. What will you learn a- Learn about Python data containers, their methods, and attributes. a- Learn Numpy arrays for the computation of numerical data. a- Learn Pandas data structures, DataFrames, and Series. a- Learn statistics measures of central tendency, central limit theorem, confidence intervals, and hypothesis testing. a- A brief understanding of visualization, control, and draw different inbuilt charts to extract important variables, detect outliers, and anomalies using Matplotlib and Seaborn. Who this book is for This book is for anyone who wants to use Python for Data Analysis and Visualization. This book is for novices as well as experienced readers with working knowledge of the pandas library. Basic knowledge of Python is a must. Table of Contents 1. Introduction to Data Analysis 2. Jupyter lab 3. Python overview 4. Introduction to Numpy 5. Introduction to Pandas 6. Data Analysis 7. Time-Series Analysis 8. Introduction to Statistics 9. Matplotlib 10. Seaborn 11. Exploratory Data Analysis About the Author Purna Chander Rao.Kathula is a Data Science enthusiast, Data Manager, Seasoned Programmer, and a Technical trainer, with around 17+ years of experience in a vast array of languages, including Perl, C, C++, Java, and Python and wide variety set of domains like Insurance, Adtech, Storage, Gaming, Mobility, Big Data, and Analytics. He is a certified Applied Data Science with Python Specialization, from Coursera, University of Michigan. He graduated from the College of Engineering G.I.T.A.M with a degree in Mechanical Engineering. He is a frequent speaker at DataScience and Data Engineering user groups, and he regularly delivers webinars and conducts training on Hadoop, Big data, Data Analysis, and Visualization technologies. Your Blog links https://blog.imaginea.com/author/purna-chander-rao-kathula/ https://www.slideshare.net/PurnaChander1 https://www.slideshare.net/sriganesha/hive-and-data-analysis-using-pandas Your LinkedIn Profile: https://www.linkedin.com/in/purna-chander-rao-kathula-043852a/

Voir

Publié par

BPB Publications

Date de parution

03 septembre 2020

Nombre de lectures

EAN13

9789389845655

Langue

English

Informatique

Hands-on Data Analysis and Visualization with Pandas

Engineer, Analyse and Visualize Data, Using Powerful Python Libraries

Purna Chander Rao. Kathula
www.bpbonline.com
FIRST EDITION 2020
Copyright © BPB Publications, India
ISBN: 978-93-89845-648
All Rights Reserved. No part of this publication may be reproduced or distributed in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of author’s & publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but cannot be held responsible for any loss or damage arising from any information in this book.
All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
Distributors:
BPB PUBLICATIONS
20, Ansari Road, Darya Ganj
New Delhi-110002
Ph: 23254990/23254991
MICRO MEDIA
Shop No. 5, Mahendra Chambers,
150 DN Rd. Next to Capital Cinema,
V.T. (C.S.T.) Station, MUMBAI-400 001
Ph: 22078296/22078297
DECCAN AGENCIES
4-3-329, Bank Street,
Hyderabad-500195
Ph: 24756967/24756400
BPB BOOK CENTRE
376 Old Lajpat Rai Market,
Delhi-110006
Ph: 23861747
Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
www.bpbonline.com
Dedicated to
My Guru Kalyan Ram Kuppachi Vice President, Engineering (Pramati Technologies Pvt Ltd)
About the Author
Purna Chander is currently working as a Data Architect with Pramati Technologies Pvt Ltd, Hyderabad. He has around 17 years of experience working with a wide variety of diverse domains like insurance, mobility, HRMS, storage, database, search engines, ad-tech, gaming, big data, and analytics. He holds a Bachelor’s degree (B.Tech) in mechanical from College of Engineering G.I.T.A.M.
He is a data science enthusiast and seasoned software programmer in a vast array of programming languages, including Perl, C, C++, Java, and Python. He is a coursera certified in Applied Data Science with Python Specialization from the University of Michigan. He is a frequent speaker at data science and data engineering user groups, and he regularly delivers webinars and conducts training on hadoop, big data, data analysis, and visualizations.
About the Reviewer
Sampath Kumar Maddula is a passionate software engineer who enjoys building analytical and data science-centric products with strong coding skills in Python and SQL. He has 8 years of experience in building analytical dashboards, ETL & data engineering, data pipelines, and scalable cloud data processing frameworks. He worked for MNCs clients like Standard Chartered Bank & Hyperion Insurance Group and also for fast-moving startups like Castlight Health, Sema4Genomics & Clara Analytics. He has certifications in Apache Spark with Scala, Machine Learning & AI Foundation, Python Programming Efficiently, Python Design Patterns, Hadoop - Spark Starter Kit, 2017 in the past three years.
In addition, Sampath is well-versed in technologies related to Distributed Design Architectures, Database Design, Machine Learning, and Data Science. He holds a Master’s degree in Information Systems from Birla Institute of Technology and Science, Pilani and he is currently working as a Principal Engineer at Pramati Technologies, Hyderabad.
Acknowledgement
First and foremost, I would like to thank God for giving me the courage to write this book. A warm thanks to all the members of BPB Publications team for giving me this opportunity to publish my book.
I would like to thank my family for their support, and for helping me in numerous ways. Writing this book was not an easy task. I would also like to thank all my friends for their useful discussions, suggestions, and providing moral support when needed.
Lastly, I would like to thank my critics. Without their criticism, I would never be able to write this book.
Preface
Python is a multi-paradigm programming language. It supports object-oriented, procedural, functional, and imperative programming and has a large and comprehensive standard library. Python is open source, simple to learn, and supports major cross-platform operating systems such as windows, linux, mac, and so on. It does support different domains such as Web and internet development, Internet of things, Desktop GUI’s, Gaming, DevOps, Big data, Web Testing/Automation, AI/ Data Science, and much more. The primary goal of this book focuses on subset sections of data science called data analysis and visualization. Data Analysis is the core area where data scientists spend most of their time in cleaning and organizing the data. The main focus of the book is to learn the usage of data science libraries of python. This book will guide you through Python basic and advanced concepts such as list comprehensions, lambdas, functional programming that help in data manipulation. This book contains many examples and real-time datasets that help you to understand the concepts better. This book is divided into 11 chapters and provides a detailed understanding of Python Data science libraries such as JupyterLab, Numpy, Pandas, Scipy, Matplotlib, and Seaborn that help in cleaning and reorganizing the data, data analysis and visualization.
Chapter 1 introduces the core concepts of data science, machine learning, artificial intelligence, and the different processes involved in data analysis. It also describes why Python is used as an essential tool in the data science domain, the core libraries used for data analysis, and the installation process.
Chapter 2 addresses the core and fundamental tool, jupyterlab, which is used as an IDE to create and share the documents from Python codes to a full-blown report. This chapter covers the architecture of jupyterlab and different components such as cells and cell modes for writing the code and documenting it using the markdown language and usage of keyboard shortcuts and toolbars.
Chapter 3 covers overview of python with basic concepts about the data types and their methods. This chapter also covers functions, lambdas, list comprehensions, functional programming, and datetime objects, which are used for working with data analysis and visualizations.
Chapter 4 explains briefly about the numpy library and its use for numerical computation. Here we also cover the internal storage, type-check, and execution speed for both Python lists and numpy arrays. This chapter also covers slicing and dicing of numpy arrays, statistical operations, fancy indexing, and broadcasting.
Chapter 5 guides you through a basic introduction to pandas. It also covers pandas data structures series, dataframes, and their methods and attributes. We also run through a real-time sample dataset to understand the concepts better.
Chapter 6 covers handling different file formats, header manipulations, filtering data based on rows, columns, indexes, groupby operations, and performing aggregations on groupby objects, concatenate and merge the dataframes, filling the missing data, pivot tables, crosstabs and handling large datasets using various methodologies.
Chapter 7 addresses creating date range using various parameters such as start and end dates, periods, yearly, monthly, hourly and seconds, converting the string and unix based dates to datetime objects, time-series analysis on a real-time dataset in finding the insights about data, handling different time zones and holidays.
Chapter 8 addresses a brief introduction to statistics. This chapter covers population, sample, measures of central tendency (mean, median and mode), inferential and differential statistics, standardization, central limit theorem, confidence intervals, and hypothesis testing along with practical examples for each topic.
Chapter 9 explains the concept of data visualization using matplotlib. This chapter describes a brief introduction to matplotlib architecture, the backend, artistic and scripting layer along with sample examples, parameters and methods controlling the visualization plots. This chapter also covers different inbuilt visualization plots, scatter plots, bar plots, line charts, pie charts, histograms and subplots.
Chapter 10 introduces the concept of data visualization using seaborn. This chapter covers statistical visualization using a real-time dataset pokemon. It also covers visualizing statistical relationships between variables, plotting the categorical data, and visualizing the distribution of data using univariate and bivariate features.
Chapter 11 This chapter is a combination of all the previous chapters, it is called exploratory data analysis. To have a better understanding of this chapter, we have taken a real-time (Titanic) dataset. This chapter covers the analysis of the dataset, filling the blank or null values, variable identification and visualizing the information using univariate and multivariate analysis, handling outliers and finding the insights about the data using different visualization techniques such as scatter plots, bar plots, line plots, heatmaps and making inferences about the data.
Downloading the code bundle and coloured images:
Please follow the link to download the Code Bundle and the Coloured Images of the book:
https://rebrand.ly/5d390
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors if any, occurred during the publishing proc