Importing Data Python Cheat Sheet.pdf | Apuntes de Informática

Python For Data Science Cheat Sheet

Importing Data

Learn Python for data science Interactively at www.DataCamp.com

Importing Data in Python

DataCamp

Learn R for Data Science Interactively

>>> filename = 'huck_finn.txt'

>>> file = open(filename, mode='r') Open the file for reading

>>> text = file.read() Read a file’s contents

>>> print(file.closed) Check whether file is closed

>>> file.close() Close file

>>> print(text)

>>> with open('huck_finn.txt', 'r') as file:

print(file.readline()) Read a single line

print(file.readline())

>>> filename = ‘mnist.txt’

>>> data = np.loadtxt(filename,

delimiter=',', String used to separate values

skiprows=2, Skip the first 2 lines

usecols=[0,2], Read the 1st and 3rd column

dtype=str) The type of the resulting array

Importing Flat Files with numpy

>>> filename = 'titanic.csv'

>>> data = np.genfromtxt(filename,

delimiter=',',

names=True, Look for column header

dtype=None)

Files with one data type

Files with mixed data types

>>> data_array = np.recfromcsv(filename)

The default dtype of the np.recfromcsv() function is None.

Importing Flat Files with pandas

>>> filename = 'winequality-red.csv'

>>> data = pd.read_csv(filename,

nrows=5, Number of rows of file to read

header=None, Row number to use as col names

sep='\t', Delimiter to use

comment='#', Character to split comments

na_values=[""]) String to recognize as NA/NaN

>>> df.head() Return first DataFrame rows

>>> df.tail() Return last DataFrame rows

>>> df.index Describe index

>>> df.columns Describe DataFrame columns

>>> df.info() Info on DataFrame

>>> data_array = data.values Convert a DataFrame to an a NumPy array

Pickled Files

>>> import pickle

>>> with open('pickled_fruit.pkl', 'rb') as file:

pickled_data = pickle.load(file)

Excel Spreadsheets

>>> file = 'urbanpop.xlsx'

>>> data = pd.ExcelFile(file)

>>> df_sheet2 = data.parse('1960-1966',

skiprows=[0],

names=['Country',

'AAM: War(2002)'])

>>> df_sheet1 = data.parse(0,

parse_cols=[0],

skiprows=[0],

names=['Country'])

Navigating Your FileSystem

>>> import os

>>> path = "/usr/tmp"

>>> wd = os.getcwd() Store the name of current directory in a string

>>> os.listdir(wd) Output contents of the directory in a list

>>> os.chdir(path) Change current working directory

>>> os.rename("test1.txt", Rename a file

"test2.txt")

>>> os.remove("test1.txt") Delete an existing file

>>> os.mkdir("newdir") Create a new directory

>>> data.sheet_names

SAS Files

>>> from sas7bdat import SAS7BDAT

>>> with SAS7BDAT('urbanpop.sas7bdat') as file:

df_sas = file.to_data_frame()

Stata Files

>>> data = pd.read_stata('urbanpop.dta')

HDF5 Files

>>> import h5py

>>> filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5'

>>> data = h5py.File(filename, 'r')

Matlab Files

>>> import scipy.io

>>> filename = 'workspace.mat'

>>> mat = scipy.io.loadmat(filename)

Relational Databases

>>> from sqlalchemy import create_engine

>>> engine = create_engine('sqlite://Northwind.sqlite')

>>> table_names = engine.table_names()

Querying Relational Databases

>>> con = engine.connect()

>>> rs = con.execute("SELECT * FROM Orders")

>>> df = pd.DataFrame(rs.fetchall())

>>> df.columns = rs.keys()

>>> con.close()

>>> with engine.connect() as con:

rs = con.execute("SELECT OrderID FROM Orders")

df = pd.DataFrame(rs.fetchmany(size=5))

df.columns = rs.keys()

Querying relational databases with pandas

>>> df = pd.read_sql_query("SELECT * FROM Orders", engine)

Text Files

Using the context manager with

>>> import numpy as np

>>> import pandas as pd

Most of the time, you’ll use either NumPy or pandas to import

your data:

Plain Text Files

Table Data: Flat Files

Exploring Your Data

To access the sheet names, use the sheet_names attribute:

Exploring Dictionaries

>>> for key in data ['meta'].keys() Explore the HDF5 structure

print(key)

Description

DescriptionURL

Detector

Duration

GPSstart

Observatory

Type

UTCstart

>>> print(data['meta']['Description'].value) Retrieve the value for a key

Using the context manager with

>>> np.info(np.ndarray.dtype)

>>> help(pd.read_csv)

Help

Accessing Data Items with Keys

Accessing Elements with Functions

>>> print(mat.keys()) Print dictionary keys

>>> for key in data.keys(): Print dictionary keys

print(key)

¡Descarga Importing Data Python Cheat Sheet.pdf y más Apuntes en PDF de Informática solo en Docsity!

Python For Data Science Cheat Sheet

Importing Data

Learn Python for data science Interactively at www.DataCamp.com

Importing Data in Python

DataCamp

Learn R for Data Science Interactively

filename = 'huck_finn.txt' file = open(filename, mode='r') Open the file for reading text = file.read() Read a file’s contents print(file.closed) Check whether file is closed file.close() Close file print(text) with open('huck_finn.txt', 'r') as file: print(file.readline()) Read a single line print(file.readline()) print(file.readline()) filename = ‘mnist.txt’ data = np.loadtxt(filename, delimiter=',', String used to separate values skiprows=2, Skip the first 2 lines usecols=[0,2], Read the 1st and 3rd column dtype=str) The type of the resulting array

Importing Flat Files with numpy

filename = 'titanic.csv' data = np.genfromtxt(filename, delimiter=',', names=True, (^) Look for column header dtype=None) Files with one data type Files with mixed data types data_array = np.recfromcsv(filename)

The default dtype of the np.recfromcsv() function is None.

Importing Flat Files with pandas

filename = 'winequality-red.csv' data = pd.read_csv(filename, nrows=5, (^) Number of rows of file to read header=None, (^) Row number to use as col names sep='\t', (^) Delimiter to use comment='#', (^) Character to split comments na_values=[ ""]) (^) String to recognize as NA/NaN df.head() Return first DataFrame rows df.tail() Return last DataFrame rows df.index Describe index df.columns Describe DataFrame columns df.info() Info on DataFrame data_array = data.values Convert a DataFrame to an a NumPy array

Pickled Files

import pickle with open('pickled_fruit.pkl', 'rb') as file: pickled_data = pickle.load(file)

Excel Spreadsheets

file = 'urbanpop.xlsx' data = pd.ExcelFile(file) df_sheet2 = data.parse('1960-1966', skiprows=[0], names=['Country', 'AAM: War(2002)']) df_sheet1 = data.parse(0, parse_cols=[0], skiprows=[0], names=['Country'])

Navigating Your FileSystem

import os path = "/usr/tmp" wd = os.getcwd() Store the name of current directory in a string os.listdir(wd) Output contents of the directory in a list os.chdir(path) Change current working directory os.rename("test1.txt", Rename a file "test2.txt") os.remove("test1.txt") Delete an existing file os.mkdir("newdir") Create a new directory data.sheet_names

SAS Files

from sas7bdat import SAS7BDAT with SAS7BDAT('urbanpop.sas7bdat') as file: df_sas = file.to_data_frame()

Stata Files

data = pd.read_stata('urbanpop.dta')

HDF5 Files

import h5py filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' data = h5py.File(filename, 'r')

Matlab Files

import scipy.io filename = 'workspace.mat' mat = scipy.io.loadmat(filename)

Relational Databases

from sqlalchemy import create_engine engine = create_engine('sqlite://Northwind.sqlite') table_names = engine.table_names()

Querying Relational Databases

con = engine.connect() rs = con.execute("SELECT * FROM Orders") df = pd.DataFrame(rs.fetchall()) df.columns = rs.keys() con.close() with engine.connect() as con: rs = con.execute("SELECT OrderID FROM Orders") df = pd.DataFrame(rs.fetchmany(size=5)) df.columns = rs.keys()

Querying relational databases with pandas

df = pd.read_sql_query("SELECT * FROM Orders", engine)

Text Files

Using the context manager with

import numpy as np import pandas as pd Most of the time, you’ll use either NumPy or pandas to import your data:

Plain Text Files

Table Data: Flat Files

Exploring Your Data

To access the sheet names, use the sheet_names attribute:

Exploring Dictionaries

for key in data ['meta'].keys() Explore the HDF5 structure print(key) Description DescriptionURL Detector Duration GPSstart Observatory Type UTCstart print(data['meta']['Description'].value) Retrieve the value for a key Using the context manager with np.info(np.ndarray.dtype) help(pd.read_csv)

Help

Accessing Data Items with Keys

Accessing Elements with Functions

print(mat.keys()) Print dictionary keys for key in data.keys(): Print dictionary keys print(key) meta quality strain pickled_data.values() Return dictionary values print(mat.items()) Returns items in list format of (key, value) tuple pairs

Magic Commands

os Library

!ls List directory contents of files and directories %cd .. Change current working directory %pwd Return the current working directory path Use the table_names() method to fetch a list of table names:

data_array.dtype Data type of array elements data_array.shape Array dimensions len(data_array) Length of array

Importing Data Python Cheat Sheet.pdf, Apuntes de Informática

Documentos relacionados

Vista previa parcial del texto

¡Descarga Importing Data Python Cheat Sheet.pdf y más Apuntes en PDF de Informática solo en Docsity!

Python For Data Science Cheat Sheet

Importing Data

Learn Python for data science Interactively at www.DataCamp.com

Importing Data in Python

DataCamp

Importing Flat Files with numpy

The default dtype of the np.recfromcsv() function is None.

Importing Flat Files with pandas

Pickled Files

Excel Spreadsheets

Navigating Your FileSystem

SAS Files

Stata Files

HDF5 Files

Matlab Files

Relational Databases

Querying Relational Databases

Querying relational databases with pandas

Text Files

Plain Text Files

Table Data: Flat Files

Exploring Your Data

Exploring Dictionaries

Help

Accessing Data Items with Keys

Accessing Elements with Functions

Magic Commands

os Library

pandas DataFrames

NumPy Arrays