jjdocker description steps, Schemes and Mind Maps of Law

jjdocker description steps jjdocker description steps

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 11/30/2024

chloe-khoury
chloe-khoury 🇱🇧

1 document

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1 GENERAL CONCEPTS
1.2. What is Docker?
1.2.1. Why is it used?
Docker is a tool designed to make it easier to create, deploy, and run applications by
using containers. Containers allow developers to package an application with all its
dependencies (code, libraries, environment variables, etc.) so that it works consistently across
different environments. This is especially useful when moving applications from one machine
to another, like from a developer's local environment to a production server.
Docker solves the problem of inconsistencies between environments by b undling everything
needed to run an application inside a c ontainer, which is lightweight, portable, and isolated
from the host machine. Containers can run on any machine that has Docker installed, ensuring
that our application behaves the same regardless of where it's deployed.
1.2.2. How Docker is Used in Web Scraping:
In web scraping, Docker helps package your scraping environment (Python libraries, scraping
tools, browsers like ChromeDriver for Selenium, etc.) into a single container. This eliminates
issues with software dependencies across different machines. To note that a contain er is a
lightweight, standalone executable package that includes everything needed to run an
application (e.g., source code, libraries, settings). For example, web scraping tools like
Selenium or BeautifulSoup often require specific dependencies (e.g., browser drivers, Python
packages). Docker makes sure everything is bundled correctly so our scraping code will run
smoothly wherever the container is executed.
1.2.3. Docker Example for Web Scraping
Here’s how I started building a simple web scraping tool with Docker:
1. I installed Docker on my system from Docker's official site.
2. I created a directory for my scraping project and navigated to it:
mkdir webscraper
cd webscraper
3. I created a Python script (scraper.py) inside my project folder with the following code:
import requests
from bs4 import BeautifulSoup
Chapter 1 GENERAL CONCEPTS
# Make a request to a website
URL = 'https://example.com'
response = requests.get(URL)
# Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')
# Find the title of the page
title = soup.title.string
print(f"Page title: {title}")
4. I created a requirements.txt file and added the libraries my project needs:
requests
beautifulsoup4
5. I created a Dockerfile with the instructions to build a Docker image for my scraper:
# Use an official Python runtime as a base image
FROM python:3.8-slim
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Run the scraper script when the container starts
CMD ["python", "scraper.py"]
· FROM python:3.8-slim: This line tells Docker to use the official Python image
as the base for my container. The 3.8-slim version is a lightweight version of
Python 3.8, which is smaller in size and includes just enough libraries to run
Python applications. Using a slim image makes the container smaller, faster to
build, and more efficient.
· WORKDIR /app: This sets the working directory inside the container to /app.
Every subsequent command (like copying files or installing dependencies) will
happen within this directory.
· COPY . /app: This copies all files and folders from my local machine’s current
directory into the /app directory inside the contai ner, allowing Docker to see
my scraper.py script, requirements.txt, and any other necessary files.
· RUN pip install --no-cache-dir -r requirements.txt: This installs the Python
packages listed in requirements.txt inside the container using pip. The --no-
cache-dir option ensures that pip doesn’t save cache files for installed
packages, keeping the container small and efficient.
pf2

Partial preview of the text

Download jjdocker description steps and more Schemes and Mind Maps Law in PDF only on Docsity!

Chapter 1 GENERAL CONCEPTS

1.2. What is Docker?

1.2.1. Why is it used?

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow developers to package an application with all its dependencies (code, libraries, environment variables, etc.) so that it works consistently across different environments. This is especially useful when moving applications from one machine to another, like from a developer's local environment to a production server.

Docker solves the problem of inconsistencies between environments by bundling everything needed to run an application inside a container, which is lightweight, portable, and isolated from the host machine. Containers can run on any machine that has Docker installed, ensuring that our application behaves the same regardless of where it's deployed.

1.2.2. How Docker is Used in Web Scraping:

In web scraping, Docker helps package your scraping environment (Python libraries, scraping tools, browsers like ChromeDriver for Selenium, etc.) into a single container. This eliminates issues with software dependencies across different machines. To note that a container is a lightweight, standalone executable package that includes everything needed to run an application (e.g., source code, libraries, settings). For example, web scraping tools like Selenium or BeautifulSoup often require specific dependencies (e.g., browser drivers, Python packages). Docker makes sure everything is bundled correctly so our scraping code will run smoothly wherever the container is executed.

1.2.3. Docker Example for Web Scraping

Here’s how I started building a simple web scraping tool with Docker:

  1. I installed Docker on my system from Docker's official site.
  2. I created a directory for my scraping project and navigated to it:

mkdir webscraper cd webscraper

  1. I created a Python script (scraper.py) inside my project folder with the following code:

import requests from bs4 import BeautifulSoup

Chapter 1 GENERAL CONCEPTS

# Make a request to a website URL = 'https://example.com' response = requests.get(URL)

# Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser')

# Find the title of the page title = soup.title.string print(f"Page title: {title}")

  1. I created a requirements.txt file and added the libraries my project needs:

requests beautifulsoup

  1. I created a Dockerfile with the instructions to build a Docker image for my scraper:

# Use an official Python runtime as a base image FROM python:3.8-slim

# Set the working directory to /app WORKDIR /app

# Copy the current directory contents into the container at /app COPY. /app

# Install the dependencies RUN pip install --no-cache-dir -r requirements.txt

# Run the scraper script when the container starts CMD ["python", "scraper.py"]

∑ FROM python:3.8-slim: This line tells Docker to use the official Python image as the base for my container. The 3.8-slim version is a lightweight version of Python 3.8, which is smaller in size and includes just enough libraries to run Python applications. Using a slim image makes the container smaller, faster to build, and more efficient. ∑ WORKDIR /app: This sets the working directory inside the container to /app. Every subsequent command (like copying files or installing dependencies) will happen within this directory. ∑ COPY. /app: This copies all files and folders from my local machine’s current directory into the /app directory inside the container, allowing Docker to see my scraper.py script, requirements.txt, and any other necessary files. ∑ RUN pip install --no-cache-dir -r requirements.txt: This installs the Python packages listed in requirements.txt inside the container using pip. The --no- cache-dir option ensures that pip doesn’t save cache files for installed packages, keeping the container small and efficient.

Chapter 1 GENERAL CONCEPTS

∑ CMD ["python", "scraper.py"]: This specifies the command to run when the container starts, telling Docker to run the scraper.py Python script.

  1. I built the Docker image by running the following command in my project directory:

docker build -t webscraper

This command tells Docker to create an image named webscraper based on the Dockerfile in the current directory.

∑ -t webscraper: This flag assigns the name webscraper to the image. I can use any name here, but webscraper is just an example. ∑. (dot): The dot at the end tells Docker to look for the Dockerfile in the current directory.

  1. Once the image is built, I ran my scraper inside a Docker container:

docker run webscraper

This executes the scraper.py file inside the container and outputs the title of the web page. Docker creates a new container using the webscraper image and then executes the command specified in the Dockerfile.

Chapter 1 GENERAL CONCEPTS

  1. After I am done, I can stop and remove any running containers (optional):

docker container ls -a # List all containers docker rm <container_id> # Remove the container by ID

If I have many containers running or exited, it’s a good idea to remove them once they’re no longer needed to free up system resources.

1.2.4. Benefits of Docker in Scraping:

Consistency : The same code will run identically on any machine where Docker is installed, eliminating issues caused by different environments. ∑ Dependency Management : All the necessary dependencies are packaged inside the Docker container. ∑ Portability : You can easily share the Docker image with others, and they can run the same code on their machines without any setup issues. ∑ Isolation : The containerized environment isolates the scraping tool from your host machine, ensuring that any issues inside the container won’t affect the host.

1.3. What is scraping Third-Party Apps/Websites?

This refers to collecting data from websites or apps that we do not own. It can be tricky because some websites block scraping, or they might have legal restrictions on scraping their data. In fact, we scrape third-party apps and websites when we need information that they display publicly, like prices, reviews, or product details, but don’t offer an API to access the data easily.

To do it, we use tools like BeautifulSoup (for HTML), Selenium (for dynamic content), or APIs (if available) to extract data. However, it’s important to always check the website’s terms of service to make sure we’re not violating any rules.