





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The documents covers the topic of basics of Data science field, with coding with python and Sql. Also covers the web scraping learning with tools such as selenium. A guide to data analyst role .
Typology: Study Guides, Projects, Research
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners, serving their multiple needs.
Customers use their platform to search and discover restaurants, read and write customer generated reviews and view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.
On the other hand, They provide restaurant partners with industry-speci c marketing tools which enable them to engage and acquire customers to grow their business while also providing a reliable and e cient last mile delivery service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato
Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.
There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOver ow, etc. have API’s that allow you to access their data in a structured format.
This is the best option, but there are other sites that don’t allow users to access large amounts of data in a structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web Scraping to scrape the website for data, To learn more checkout webscraping
Scraping the best 100 listings on zomato by parsing the information from this website in the form of Tabular data.
1-Top 100 Listings Of Restaurants For Each Location.
2-The 'Name' Of The Restaurants For Each Location.
3-The 'Ratings' Of Dining At The Restaurants For Each Location.
4-The 'Link' Of Restaurants For Each Location.
The code which has been used for this project is publicly available at the replit platform.Feel free to explore the code and make changes for the betterment of the code to make it more e cient.Let's get on the road to identify how the details are fetched and scraped for this project.
Replit Platform
FIRST-- SELENIUM -- what is selenium
SECOND -- PANDAS -- what is pandas
THIRD -- TIME -- why do we use TIME
FOURTH -- OS -- why do we use OS
import os
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
def res_name(driver):
def get_driver():
chrome_options=Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usgae')
#to access the zomato's website we need to setup a 'user-agent' access, we cant access the website without creating a standard 'user-agent'.learn more about user-agent setup
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) chrome_options.add_argument('user-agent= {0}'.format(user_agent))
driver = webdriver.Chrome(options=chrome_options)
return driver
#calling the driver to carry out further steps
driver=get_driver()
places = driver.find_element(By.CLASS_NAME,place_divs_tag)
tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')
urls = []
for i in tags:
urls.append(i.find_element(By.TAG_NAME,"a").get_attribute('href'))
return urls[:100]
def res_ratings(driver): place_divs_tag = 'sc-bke1zw-0'
places = driver.find_element(By.CLASS_NAME,place_divs_tag)
tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')
ratings = []
for i in tags:
try: ratings.append(i.find_element(By.CLASS_NAME,'sc-1q7bklc-5').text)
except:
ratings.append('.')
return ratings[:100]
To avoid running into exception while running this code, we make use of the method of 'TRY AND EXCEPT". here you can learn more about try and except
We create a parser function named "get_all_cities()" to extract the required details from the website containing the required NAME,RATINGS,LINK in the form of dictionary. We create such a function which can be e cient irrespective of the Location for eg: --mumbai,pune,bangalore,delhi,chandigarh etc..by creating this function we get the required details which was the objective of this project and it can be done for any location, in this case we are scraping for ' Mumbai , Bangalore, Pune '.
def get_all_cities(): cities = ['mumbai','bangalore','pune']
dic={'NAME':[],'RATINGS':[],'LINK':[]} for i in cities: base_url = 'https://www.zomato.com/'+ i + '/great-food-no-bull' driver.get(base_url)
dic['NAME'].extend(res_name(driver))
dic['RATINGS'].extend(res_ratings(driver))
dic['LINK'].extend(res_url(driver))
return dic
It is quite fascinating that the amount of ease Webscraping brings to the life of all the CODERS. Summing up, We essentially built a code in the Following steps: -we setup the required packages selenium,pandas,time and os.
-we create a helper function to get the Names,Ratings,Url's for the top 100 listings.