Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Zero to data scientist, Study Guides, Projects, Research of Data Analysis & Statistical Methods

University of Mumbai Data Analysis & Statistical Methods

The documents covers the topic of basics of Data science field, with coding with python and Sql. Also covers the web scraping learning with tools such as selenium. A guide to data analyst role .

Typology: Study Guides, Projects, Research

2021/2022

Available from 09/28/2022

haris-ahmed-inamdar 🇮🇳

1 document

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Scraping

Zomato's

Top

100

Restaurants

using

Selenium

Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners,

serving their multiple needs.

Customers use their platform to search and discover restaurants, read and write customer generated reviews and

view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.

On the other hand, They provide restaurant partners with industry-specic marketing tools which enable them to

engage and acquire customers to grow their business while also providing a reliable and ecient last mile delivery

service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato

Let's

walk

you

through

'WEB

SCRAPING'!!

Discover Study Guides, Projects, Research of Data Analysis & Statistical Methods University of Mumbai

Partial preview of the text

Download Zero to data scientist and more Study Guides, Projects, Research Data Analysis & Statistical Methods in PDF only on Docsity!

Scraping Zomato's Top 100 Restaurants using

Selenium

Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners, serving their multiple needs.

Customers use their platform to search and discover restaurants, read and write customer generated reviews and view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.

On the other hand, They provide restaurant partners with industry-specic marketing tools which enable them to engage and acquire customers to grow their business while also providing a reliable and ecient last mile delivery service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato

Let's walk you through 'WEB SCRAPING'!!

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.

There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOverow, etc. have API’s that allow you to access their data in a structured format.

This is the best option, but there are other sites that don’t allow users to access large amounts of data in a structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web Scraping to scrape the website for data, To learn more checkout webscraping

Objective:

Scraping the best 100 listings on zomato by parsing the information from this website in the form of Tabular data.

List of details we are looking on website:

1-Top 100 Listings Of Restaurants For Each Location.

2-The 'Name' Of The Restaurants For Each Location.

3-The 'Ratings' Of Dining At The Restaurants For Each Location.

4-The 'Link' Of Restaurants For Each Location.

Project Code On Replit

The code which has been used for this project is publicly available at the replit platform.Feel free to explore the code and make changes for the betterment of the code to make it more ecient.Let's get on the road to identify how the details are fetched and scraped for this project.

Replit Platform

The List Of Packages Used

FIRST-- SELENIUM -- what is selenium

SECOND -- PANDAS -- what is pandas

THIRD -- TIME -- why do we use TIME

FOURTH -- OS -- why do we use OS

Let's Discuss The Steps In The Project

1ST STEP

At the beginning of the project, we import the required packages needed,as shown

below:-

import os

import pandas as pd

import time

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.by import By

from selenium.common.exceptions import NoSuchElementException

3RD STEP

Creating a helper function, to get the list of details from the website containing

'Restaurants' Name. We call it 'res_name(driver)':-

def res_name(driver):

2ND STEP

Let's create a function to create the webdriver that we will use to extract webpage

information. The driver function is as follows:-

def get_driver():

chrome_options=Options()

chrome_options.add_argument('--no-sandbox')

chrome_options.add_argument('--headless')

chrome_options.add_argument('--disable-dev-shm-usgae')

#to access the zomato's website we need to setup a 'user-agent' access, we cant access the website without creating a standard 'user-agent'.learn more about user-agent setup

user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) chrome_options.add_argument('user-agent= {0}'.format(user_agent))

driver = webdriver.Chrome(options=chrome_options)

return driver

#calling the driver to carry out further steps

driver=get_driver()

places = driver.find_element(By.CLASS_NAME,place_divs_tag)

tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')

urls = []

for i in tags:

urls.append(i.find_element(By.TAG_NAME,"a").get_attribute('href'))

return urls[:100]

5TH STEP

Creating a helper function, to get the list of details for Rating's from the website. We

call it 'res_ratings(driver)':-

def res_ratings(driver): place_divs_tag = 'sc-bke1zw-0'

places = driver.find_element(By.CLASS_NAME,place_divs_tag)

tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')

ratings = []

for i in tags:

try: ratings.append(i.find_element(By.CLASS_NAME,'sc-1q7bklc-5').text)

except:

ratings.append('.')

return ratings[:100]

To avoid running into exception while running this code, we make use of the method of 'TRY AND EXCEPT". here you can learn more about try and except

6TH STEP

We create a parser function named "get_all_cities()" to extract the required details from the website containing the required NAME,RATINGS,LINK in the form of dictionary. We create such a function which can be ecient irrespective of the Location for eg: --mumbai,pune,bangalore,delhi,chandigarh etc..by creating this function we get the required details which was the objective of this project and it can be done for any location, in this case we are scraping for ' Mumbai , Bangalore, Pune '.

def get_all_cities(): cities = ['mumbai','bangalore','pune']

dic={'NAME':[],'RATINGS':[],'LINK':[]} for i in cities: base_url = 'https://www.zomato.com/'+ i + '/great-food-no-bull' driver.get(base_url)

dic['NAME'].extend(res_name(driver))

dic['RATINGS'].extend(res_ratings(driver))

dic['LINK'].extend(res_url(driver))

return dic

7TH STEP

We create a pandas DataFrame of the parsed data and export it to a CSV file named

best100.csv and achieve the expected result as shown again below,

SUMMARY

It is quite fascinating that the amount of ease Webscraping brings to the life of all the CODERS. Summing up, We essentially built a code in the Following steps: -we setup the required packages selenium,pandas,time and os.

-we create a helper function to get the Names,Ratings,Url's for the top 100 listings.

Zero to data scientist, Study Guides, Projects, Research of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Zero to data scientist and more Study Guides, Projects, Research Data Analysis & Statistical Methods in PDF only on Docsity!

Scraping Zomato's Top 100 Restaurants using

Selenium

Let's walk you through 'WEB SCRAPING'!!

Objective:

List of details we are looking on website:

Project Code On Replit

The List Of Packages Used

Let's Discuss The Steps In The Project

1ST STEP

At the beginning of the project, we import the required packages needed,as shown

below:-

3RD STEP

Creating a helper function, to get the list of details from the website containing

'Restaurants' Name. We call it 'res_name(driver)':-

2ND STEP

Let's create a function to create the webdriver that we will use to extract webpage

information. The driver function is as follows:-

5TH STEP

Creating a helper function, to get the list of details for Rating's from the website. We

call it 'res_ratings(driver)':-

6TH STEP

7TH STEP

We create a pandas DataFrame of the parsed data and export it to a CSV file named

best100.csv and achieve the expected result as shown again below,

SUMMARY