Zero to data scientist, Study Guides, Projects, Research of Data Analysis & Statistical Methods

The documents covers the topic of basics of Data science field, with coding with python and Sql. Also covers the web scraping learning with tools such as selenium. A guide to data analyst role .

Typology: Study Guides, Projects, Research

2021/2022

Available from 09/28/2022

haris-ahmed-inamdar
haris-ahmed-inamdar 🇮🇳

1 document

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Scraping
Zomato's
Top
100
Restaurants
using
Selenium
Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners,
serving their multiple needs.
Customers use their platform to search and discover restaurants, read and write customer generated reviews and
view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.
On the other hand, They provide restaurant partners with industry-specic marketing tools which enable them to
engage and acquire customers to grow their business while also providing a reliable and ecient last mile delivery
service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato
Let's
walk
you
through
'WEB
SCRAPING'!!
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Zero to data scientist and more Study Guides, Projects, Research Data Analysis & Statistical Methods in PDF only on Docsity!

Scraping Zomato's Top 100 Restaurants using

Selenium

Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners, serving their multiple needs.

Customers use their platform to search and discover restaurants, read and write customer generated reviews and view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.

On the other hand, They provide restaurant partners with industry-specic marketing tools which enable them to engage and acquire customers to grow their business while also providing a reliable and ecient last mile delivery service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato

Let's walk you through 'WEB SCRAPING'!!

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.

There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOverow, etc. have API’s that allow you to access their data in a structured format.

This is the best option, but there are other sites that don’t allow users to access large amounts of data in a structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web Scraping to scrape the website for data, To learn more checkout webscraping

Objective:

Scraping the best 100 listings on zomato by parsing the information from this website in the form of Tabular data.

List of details we are looking on website:

1-Top 100 Listings Of Restaurants For Each Location.

2-The 'Name' Of The Restaurants For Each Location.

3-The 'Ratings' Of Dining At The Restaurants For Each Location.

4-The 'Link' Of Restaurants For Each Location.

Project Code On Replit

The code which has been used for this project is publicly available at the replit platform.Feel free to explore the code and make changes for the betterment of the code to make it more ecient.Let's get on the road to identify how the details are fetched and scraped for this project.

Replit Platform

The List Of Packages Used

FIRST-- SELENIUM -- what is selenium

SECOND -- PANDAS -- what is pandas

THIRD -- TIME -- why do we use TIME

FOURTH -- OS -- why do we use OS

Let's Discuss The Steps In The Project

1ST STEP

At the beginning of the project, we import the required packages needed,as shown

below:-

import os

import pandas as pd

import time

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.by import By

from selenium.common.exceptions import NoSuchElementException

3RD STEP

Creating a helper function, to get the list of details from the website containing

'Restaurants' Name. We call it 'res_name(driver)':-

def res_name(driver):

2ND STEP

Let's create a function to create the webdriver that we will use to extract webpage

information. The driver function is as follows:-

def get_driver():

chrome_options=Options()

chrome_options.add_argument('--no-sandbox')

chrome_options.add_argument('--headless')

chrome_options.add_argument('--disable-dev-shm-usgae')

#to access the zomato's website we need to setup a 'user-agent' access, we cant access the website without creating a standard 'user-agent'.learn more about user-agent setup

user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) chrome_options.add_argument('user-agent= {0}'.format(user_agent))

driver = webdriver.Chrome(options=chrome_options)

return driver

#calling the driver to carry out further steps

driver=get_driver()

places = driver.find_element(By.CLASS_NAME,place_divs_tag)

tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')

urls = []

for i in tags:

urls.append(i.find_element(By.TAG_NAME,"a").get_attribute('href'))

return urls[:100]

5TH STEP

Creating a helper function, to get the list of details for Rating's from the website. We

call it 'res_ratings(driver)':-

def res_ratings(driver): place_divs_tag = 'sc-bke1zw-0'

places = driver.find_element(By.CLASS_NAME,place_divs_tag)

tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')

ratings = []

for i in tags:

try: ratings.append(i.find_element(By.CLASS_NAME,'sc-1q7bklc-5').text)

except:

ratings.append('.')

return ratings[:100]

To avoid running into exception while running this code, we make use of the method of 'TRY AND EXCEPT". here you can learn more about try and except

6TH STEP

We create a parser function named "get_all_cities()" to extract the required details from the website containing the required NAME,RATINGS,LINK in the form of dictionary. We create such a function which can be ecient irrespective of the Location for eg: --mumbai,pune,bangalore,delhi,chandigarh etc..by creating this function we get the required details which was the objective of this project and it can be done for any location, in this case we are scraping for ' Mumbai , Bangalore, Pune '.

def get_all_cities(): cities = ['mumbai','bangalore','pune']

dic={'NAME':[],'RATINGS':[],'LINK':[]} for i in cities: base_url = 'https://www.zomato.com/'+ i + '/great-food-no-bull' driver.get(base_url)

dic['NAME'].extend(res_name(driver))

dic['RATINGS'].extend(res_ratings(driver))

dic['LINK'].extend(res_url(driver))

return dic

7TH STEP

We create a pandas DataFrame of the parsed data and export it to a CSV file named

best100.csv and achieve the expected result as shown again below,

SUMMARY

It is quite fascinating that the amount of ease Webscraping brings to the life of all the CODERS. Summing up, We essentially built a code in the Following steps: -we setup the required packages selenium,pandas,time and os.

-we create a helper function to get the Names,Ratings,Url's for the top 100 listings.