claves pandas para python | Guías, Proyectos, Investigaciones de Programación Informática

Creating DataFrames Reshaping Data –Change layout, sorting, reindexing, renaming

pd.melt(df)

Gather columns into rows. df.pivot(columns='var', values='val')

Spread rows into columns.

pd.concat([df1,df2])

Append rows of DataFrames

pd.concat([df1,df2], axis=1)

Append columns of DataFrames

df.sort_values('mpg')

Order rows by values of a column (low to high).

df.sort_values('mpg’, ascending=False)

Order rows by values of a column (high to low).

df.rename(columns = {'y':'year'})

Rename the columns of a DataFrame

df.sort_index()

Sort the index of a DataFrame

df.reset_index()

Reset index of DataFrame to row numbers, moving

index to columns.

df.drop(columns=['Length’, 'Height'])

Drop columns from DataFrame

df = pd.DataFrame(

{"a" : [4, 5, 6],

"b" : [7, 8, 9],

"c" : [10, 11, 12]},

index = [1, 2, 3])

Specify values for each column.

df = pd.DataFrame(

[[4, 7, 10],

[5, 8, 11],

[6, 9, 12]],

index=[1, 2, 3],

columns=['a', 'b', 'c'])

Specify values for each row.

df = pd.DataFrame(

{"a" : [4 ,5, 6],

"b" : [7, 8, 9],

"c" : [10, 11, 12]},

index = pd.MultiIndex.from_tuples(

[('d’, 1), ('d’, 2),

('e’, 2)], names=['n’, 'v']))

Create DataFrame with a MultiIndex

Method Chaining

Most pandas methods return a DataFrame so that

another pandas method can be applied to the result.

This improves readability of code.

df = (pd.melt(df)

.rename(columns={

'variable':'var',

'value':'val'})

.query('val >= 200')

)

Logic in Python (and pandas)

Less than

Not equal to

Greater than

df.column.isin

(values)

Group membership

Equals

pd.isnull

(obj)

NaN

Less than or equals

pd.notnull

(obj)

Is not

NaN

Greater than or equals

&,|,~,^,

df.any(),df.all()

Logical

and, or, not, xor, any, all

regex (Regular Expressions) Examples

\.'

Matches strings containing

a period '.'

'Length$'

Matches strings ending with word 'Length'

'^Sepal'

Matches strings beginning with the word 'Sepal'

'^x[1

-5]$'

Matches strings beginning with 'x' and ending with 1,2,3,4,5

'^(?!

Species$).*'

Matches strings except

the string 'Species'

Pandas API Reference Pandas User Guide

Data Wrangling

with pandas Cheat Sheet

http://pandas.pydata.org

Tidy Data –A foundation for wrangling in pandas

In a tidy

data set:

Each variable is saved

in its own column

&Each observation is

saved in its own row

Tidy data complements pandas’s vectorized

operations. pandas will automatically preserve

observations as you manipulate variables. No

other format works as intuitively with pandas.

M A

df[df.Length > 7]

Extract rows that meet logical criteria.

df.drop_duplicates()

Remove duplicate rows (only considers columns).

df.sample(frac=0.5)

Randomly select fraction of rows.

df.sample(n=10) Randomly select n rows.

df.nlargest(n, 'value’)

Select and order top n entries.

df.nsmallest(n, 'value')

Select and order bottom n entries.

df.head(n)

Select first n rows.

df.tail(n)

Select last n rows.

df[['width’, 'length’, 'species']]

Select multiple columns with specific names.

df['width'] or df.width

Select single column with specific name.

df.filter(regex='regex')

Select columns whose name matches

regular expression regex.

df.iloc[10:20]

Select rows 10-20.

df.iloc[:, [1, 2, 5]]

Select columns in positions 1, 2 and 5 (first

column is 0).

df.loc[:, 'x2':'x4']

Select all columns between x2 and x4 (inclusive).

df.loc[df['a'] > 10, ['a’, 'c']]

Select rows meeting logical condition, and only

the specific columns .

df.iat[1, 2] Access single value by index

df.at[4, 'A'] Access single value by label

Subset Observations - rows Subset Variables - columns Subsets - rows and columns

Use df.loc[] and df.iloc[] to select only

rows, only columns or both.

Use df.at[] and df.iat[] to access a single

value by row and column.

First index selects rows, second index columns.

Cheatsheet for pandas (http://pandas.pydata.org/ originally written by Irv Lustig, Princeton Consultants , inspired by Rstudio Data Wrangling Cheatsheet

Using query

query() allows Boolean expressions for filtering

rows.

df.query('Length > 7')

df.query('Length > 7 and Width < 8')

df.query('Name.str.startswith("abc")',

engine="python")

claves pandas para python, Guías, Proyectos, Investigaciones de Programación Informática

Documentos relacionados

Vista previa parcial del texto

¡Descarga claves pandas para python y más Guías, Proyectos, Investigaciones en PDF de Programación Informática solo en Docsity!

Creating DataFrames Reshaping Data – Change layout, sorting, reindexing, renaming

Method Chaining

Pandas API Reference Pandas User Guide

Data Wrangling

In a tidy

data set:

Each variable is saved

in its own column

Each observation is

saved in its own row

Tidy data complements pandas’s vectorized

operations. pandas will automatically preserve

observations as you manipulate variables. No

other format works as intuitively with pandas.

Summarize Data

Make New Columns

Combine Data Sets

of rows in DataFrame.

of distinct values in a column.

adf bdf

ydf zdf

Group Data

Windows

Handling Missing Data

Plotting