python document example | Exams Health sciences

PY THON

2 Easy Ways to Get Tables From a

Website with Pandas

An overview of pd.read_html and pd.read_clipboard

Byron Dolon Follow

May 15 · 5 min read

Image created by @siscadraws (Instagram)

The pandas library is well known for its easy-to-use data analysis

capabilities. It’s equipped with advanced indexing, DataFrame joining and

data aggregation features. Pandas also has a comprehensive I/O API that

you can use to input data from various sources and output data to various

formats.

There are many occasions when you just need to get a table from a website

to use in your analysis. Here’s a look at how you can use the pandas

read_html and read_clipboard to get tables from websites with just a couple

lines of code.

Note, before trying any of the code below, don’t forget to import pandas.

import pandas as pd

1. pandas.read_html()

Let’s try getting this table with key Tesla executives for this example:

Yahoo Finance table of Elon Musk and other Tesla executives information

The read_html function has this description:

Read HTML tables into a list of DataFrame objects.

The function searches for HTML <table> related tags on the input (URL)

you provide. It always returns a list, even if the site only has one table. To

use the function, all you need to do is put the URL of the site you want as

the first argument of the function. Running the function for the Yahoo

Finance site looks like this:

pd.read_html('https://finance.yahoo.com/quote/TSLA/profile?p=TSLA')

Raw output of read_html

To get a DataFrame from this list, you only need to make one addition:

pd.read_html('https://finance.yahoo.com/quote/TSLA/profile?p=TSLA')

[0]

Adding the ‘[0]’ selects the first element in the list. There is only one

element in our list, and it is a DataFrame object. Running this code gives

you this output:

Output of read_html with list index selection

...

Now, let’s try getting this table with summary statistics for the Tesla stock:

Yahoo Finance summary table for Tesla stock

We’ll try the same code as before:

pd.read_html('https://finance.yahoo.com/quote/TSLA?p=TSLA')

You have 2 free stories left this month. Sign up and get an extra one for free.

Towards Data

Science

A Medium

publication sharing

concepts, ideas, and

codes.

python document example, Exams of Health sciences

Related documents

Partial preview of the text

Download python document example and more Exams Health sciences in PDF only on Docsity!

2 Easy Ways to Get Tables From a

Website with Pandas

An overview of pd.read_html and pd.read_clipboard

1. pandas.read_html()