Well done documents for boards, Summaries of Computer science

Well done documents for boards

Typology: Summaries

2022/2023

Uploaded on 01/27/2023

sweta_biswasroy
sweta_biswasroy 🇮🇳

3 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1
Data Handling
using Pandas
New
syllabus
2022-23
Visit : python.mykvs.in for regular updates
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Well done documents for boards and more Summaries Computer science in PDF only on Docsity!

Chapter 1 Data Handling using Pandas New syllabus 2022 - 23

Visit : python.mykvs.in for regular updates Visit : python.mykvs.in for regular updates Visit : python.mykvs.in for regular updates Python Library Pandas It is a most famous Python package for data science, which offers powerful and flexible data structures that make data analysis and manipulation easy.Pandas makes data importing and data analyzing much easier. Pandas builds on packages like NumPy and matplotlib to give us a single & convenient place for data analysis and visualization work.

Basic Features of Pandas

  1. Dataframe object help a lot in keeping track of our data.
  2. With a pandas dataframe, we can have different data types (float, int, string, datetime, etc) all in one place
  3. Pandas has built in functionality for like easy grouping & easy joins of data, rolling windows
  4. Good IO capabilities; Easily pull data from a MySQL database directly into a data frame
  5. With pandas, you can use patsy for R-style syntax in doing regressions.
  6. Tools for loading data into in-memory data objects from different file formats.
  7. Data alignment and integrated handling of missing data.
  8. Reshaping and pivoting of data sets.
  9. Label-based slicing, indexing and subsetting of large data sets.

Visit : python.myks.in for regular updates Pandas – Installation/Environment Setup Pandas module doesn't come bundled with Standard Python. If we install Anaconda Python package Pandas will be installed by default. Steps for Anaconda installation & Use

1. visit the site https://www.anaconda.com/download/

  1. Download appropriate anaconda installer
  2. After download install it.
  3. During installation check for set path and all user
  4. After installation start spyder utility of anaconda from start menu

6. Type import pandas as pd in left pane(temp.py)

  1. Then run it.
  2. If no error is show then it shows pandas is installed.
  3. Like default temp.py we can create another .py file from new window option of file menu for new program.

Pandas – Installation/Environment Setup 4.Now move to script folder of python distribution in command prompt (through cmd command of windows).

  1. Execute following commands in command prompt serially.

    pip install numpy pip install six pip install pandas Wait after each command for installation Now we will be able to use pandas in standard python distribution.

6. Type import pandas as pd in python (IDLE) shell.

  1. If it executed without error(it means pandas is installed on your system)

Data Structures in Pandas Two important data structures of pandas are–Series, DataFrame

  1. Series Series is like a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers. Basic feature of series are ❖ Homogeneous data ❖ Size Immutable ❖ Values of Data Mutable

Pandas Series It is like one-dimensional array capable of holding data of any type (integer, string, float, python objects, etc.). Series can be created using constructor. Syntax :- pandas.Series( data, index, dtype, copy) Creation of Series is also possible from ndarray, dictionary, scalar value. Series can be created using

  1. Array
  2. Dict
  3. Scalar value or constant

Pandas Series Create an Empty Series e.g. import pandas as pseries s = pseries.Series() print(s) Output Series([], dtype: float64)

Pandas Series Create a Series from dict Eg.1(without index) import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd1.Series(data) print(s) Output a 0. b 1. c 2. dtype: float Eg.2 (with index) import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd1.Series(data,index=['b','c','d','a']) print(s) Output b 1. c 2. d NaN a 0. dtype: float

Create a Series from Scalar e.g import pandas as pd import numpy as np s = pd1.Series(5, index=[0, 1, 2, 3]) print(s) Output 0 5 1 5 2 5 3 5 dtype: int Note :- here 5 is repeated for 4 times (as per no of index)

Pandas Series Head function e.g import pandas as pd s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print (s.head(3)) Output a 1 b. 2 c. 3 dtype: int Return first 3 elements

Pandas Series tail function e.g import pandas as pd s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print (s.tail(3)) Output c 3 d. 4 e. 5 dtype: int Return last 3 elements

Pandas Series Retrieve Data Using Label as (Index) e.g. import pandas as pd s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print (s[['c','d']]) Output c 3 d 4 dtype: int

Pandas Series Retrieve Data from selection

There are three methods for data selection:

▪ loc gets rows (or columns) with particular labels from

the index.

▪ iloc gets rows (or columns) at particular positions in

the index (so it only takes integers).

▪ ix usually tries to behave like loc but falls back to

behaving like iloc if a label is not present in the index.

ix is deprecated and the use of loc and iloc is encouraged

instead