Introduction to Threading in Python: Concepts, Implementation, and Best Practices, Quizzes of Programming Languages

..............................

Typology: Quizzes

2022/2023

Uploaded on 01/22/2023

omnia-nabil-gharieb-ghonem
omnia-nabil-gharieb-ghonem 🇪🇬

5 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Introduction to threading
Review
What Is a Thread?
A thread is a separate flow of execution. This means that your program will have two things
happening at once. But for most Python 3 implementations the different threads do not actually
execute at the same time: they merely appear to.
Threads run only one processor (generally)
It’s tempting to think of threading as having two (or more) different processors running on your
program, each one doing an independent task at the same time. That’s almost right. The threads
may be running on different processors, but they will only be running one at a time.
Getting multiple tasks running simultaneously requires a non-standard implementation of
Python, writing some of your code in a different language, or using
multiprocessing
which
comes with some extra overhead.
Because of the way CPython implementation of Python works, threading may not speed up all
tasks. This is due to interactions with the GIL that essentially limit one Python thread to run at a
time.
Threads are for I/O Bound programs
Tasks that spend much of their time waiting for external events are generally good candidates for
threading. Problems that require heavy CPU computation and spend little time waiting for
external events might not run faster at all.
This is true for code written in Python and running on the standard CPython implementation. If
your threads are written in C they have the ability to release the GIL and run concurrently. If you
are running on a different Python implementation, check with the documentation to see how it
handles threads.
If you are running a standard Python implementation, writing in only Python, and have a CPU-
bound problem, you should check out the
multiprocessing
module instead.
Starting a Thread
To start a separate thread, you create a
Thread
instance and then tell it to
.start()
:
import logging
import threading
import time
def thread_function(name):
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Introduction to Threading in Python: Concepts, Implementation, and Best Practices and more Quizzes Programming Languages in PDF only on Docsity!

Introduction to threading

Review

What Is a Thread?

A thread is a separate flow of execution. This means that your program will have two things

happening at once. But for most Python 3 implementations the different threads do not actually

execute at the same time: they merely appear to.

Threads run only one processor (generally)

It’s tempting to think of threading as having two (or more) different processors running on your

program, each one doing an independent task at the same time. That’s almost right. The threads

may be running on different processors, but they will only be running one at a time.

Getting multiple tasks running simultaneously requires a non-standard implementation of

Python, writing some of your code in a different language, or using multiprocessing which

comes with some extra overhead.

Because of the way CPython implementation of Python works, threading may not speed up all

tasks. This is due to interactions with the GIL that essentially limit one Python thread to run at a

time.

Threads are for I/O Bound programs

Tasks that spend much of their time waiting for external events are generally good candidates for

threading. Problems that require heavy CPU computation and spend little time waiting for

external events might not run faster at all.

This is true for code written in Python and running on the standard CPython implementation. If

your threads are written in C they have the ability to release the GIL and run concurrently. If you

are running on a different Python implementation, check with the documentation to see how it

handles threads.

If you are running a standard Python implementation, writing in only Python, and have a CPU-

bound problem, you should check out the multiprocessing module instead.

Starting a Thread

To start a separate thread, you create a Thread instance and then tell it to .start():

import logging import threading import time

def thread_function(name):

logging.info("Thread %s: starting", name) time.sleep(2) logging.info("Thread %s: finishing", name)

if name == "main": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

logging.info("Main : before creating thread") x = threading.Thread(target=thread_function, args=(1,)) logging.info("Main : before running thread") x.start() logging.info("Main : wait for the thread to finish")

x.join()

logging.info("Main : all done")

If you look around the logging statements, you can see that the main section is creating and

starting the thread:

x = threading.Thread(target=thread_function, args=(1,)) x.start()

When you run this program as it is (with line twenty commented out), the output will look like

this:

$ ./single_thread.py Main : before creating thread Main : before running thread Thread 1: starting Main : wait for the thread to finish Main : all done Thread 1: finishing

Daemon Threads

In computer science, a daemon is a process that runs in the background.

Python threading has a more specific meaning for daemon. A daemon thread will shut down

immediately when the program exits. One way to think about these definitions is to consider the

daemon thread a thread that runs in the background without worrying about shutting it down.

If a program is running Threads that are not daemons, then the program will wait for those

threads to complete before it terminates. Threads that are daemons, however, are just killed

wherever they are when the program is exiting.

Let’s look a little more closely at the output of your program above. The last two lines are the

interesting bit. When you run the program, you’ll notice that there is a pause (of about 2 seconds)

after main has printed its all done message and before the thread is finished.

The harder way of starting multiple threads is the one you already know:

import logging import threading import time

def thread_function(name): logging.info("Thread %s: starting", name) time.sleep(2) logging.info("Thread %s: finishing", name)

if name == "main": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

threads = list() for index in range(3): logging.info("Main : create and start thread %d.", index) x = threading.Thread(target=thread_function, args=(index,)) threads.append(x) x.start()

for index, thread in enumerate(threads): logging.info("Main : before joining thread %d.", index) thread.join() logging.info("Main : thread %d done", index)

This code uses the same mechanism you saw above to start a thread, create a Thread object, and

then call .start(). The program keeps a list of Thread objects so that it can then wait for them

later using .join().

Running this code multiple times will likely produce some interesting results. Here’s an example

output from my machine:

$ ./multiple_threads.py Main : create and start thread 0. Thread 0: starting Main : create and start thread 1. Thread 1: starting Main : create and start thread 2. Thread 2: starting Main : before joining thread 0. Thread 2: finishing Thread 1: finishing Thread 0: finishing Main : thread 0 done Main : before joining thread 1. Main : thread 1 done Main : before joining thread 2. Main : thread 2 done

If you walk through the output carefully, you’ll see all three threads getting started in the order

you might expect, but in this case they finish in the opposite order! Multiple runs will produce

different orderings. Look for the Thread x: finishing message to tell you when each thread is

done.

The order in which threads are run is determined by the operating system and can be quite hard

to predict. It may (and likely will) vary from run to run, so you need to be aware of that when

you design algorithms that use threading.

Fortunately, Python gives you several primitives that you’ll look at later to help coordinate

threads and get them running together. Before that, let’s look at how to make managing a group

of threads a bit easier.

Did you test this on the code with the daemon thread or the regular thread? It turns out that it

doesn’t matter. If you .join() a thread, that statement will wait until either kind of thread

is finished.

Using a ThreadPoolExecutor (Recommended)

There’s an easier way to start up a group of threads than the one you saw above. It’s called a

ThreadPoolExecutor, and it’s part of the standard library in concurrent.futures (as of

Python 3.2).

The easiest way to create it is as a context manager, using the with statement to manage the

creation and destruction of the pool.

Here’s the main from the last example rewritten to use a ThreadPoolExecutor:

import concurrent.futures

[rest of code]

if name == "main": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor: executor.map(thread_function, range(3))

The code creates a ThreadPoolExecutor as a context manager, telling it how many worker

threads it wants in the pool. It then uses .map() to step through an iterable of things, in your case

range(3), passing each one to a thread in the pool.

The end of the with block causes the ThreadPoolExecutor to do a .join() on each of the

threads in the pool. It is strongly recommended that you use ThreadPoolExecutor as a

context manager when you can so that you never forget to .join() the threads.

Basic Synchronization Using Lock

To solve your race condition above, you need to find a way to allow only one thread at a time

into the read-modify-write section of your code. The most common way to do this is called Lock

in Python. In some other languages this same idea is called a mutex. Mutex comes from MUTual

EXclusion, which is exactly what a Lock does.

A Lock is an object that acts like a hall pass. Only one thread at a time can have the Lock. Any

other thread that wants the Lock must wait until the owner of the Lock gives it up.

The basic functions to do this are .acquire() and .release(). A thread will call

my_lock.acquire() to get the lock. If the lock is already held, the calling thread will wait until

it is released. There’s an important point here. If one thread gets the lock but never gives it back,

your program will be stuck. You’ll read more about this later.

Fortunately, Python’s Lock will also operate as a context manager, so you can use it in a with

statement, and it gets released automatically when the with block exits for any reason.

Let’s look at the FakeDatabase with a Lock added to it. The calling function stays the same:

class FakeDatabase: def init(self): self.value = 0 self._lock = threading.Lock()

def locked_update(self, name): logging.info("Thread %s: starting update", name) logging.debug("Thread %s about to lock", name) with self._lock: logging.debug("Thread %s has lock", name) local_copy = self.value local_copy += 1 time.sleep(0.1) self.value = local_copy logging.debug("Thread %s about to release lock", name) logging.debug("Thread %s after release", name) logging.info("Thread %s: finishing update", name)

It’s worth noting here that the thread running this function will hold on to that Lock until it is

completely finished updating the database. In this case, that means it will hold the Lock while it

copies, updates, sleeps, and then writes the value back to the database.

Deadlock

Before you move on, you should look at a common problem when using Locks. As you saw, if

the Lock has already been acquired, a second call to .acquire() will wait until the thread that is

holding the Lock calls .release(). What do you think happens when you run this code:

import threading

l = threading.Lock() print("before first acquire") l.acquire() print("before second acquire") l.acquire() print("acquired lock twice")

When the program calls l.acquire() the second time, it hangs waiting for the Lock to be

released. In this example, you can fix the deadlock by removing the second call, but deadlocks

usually happen from one of two subtle things:

1. An implementation bug where a Lock is not released properly

2. A design issue where a utility function needs to be called by functions that might or

might not already have the Lock

Lock and RLock are two of the basic tools used in threaded programming to prevent race conditions.

There are a few other that work in different ways. Before you look at them, let’s shift to a slightly

different problem domain.

Threading Objects

There are a few more primitives offered by the Python threading module. While you didn’t

need these for the examples above, they can come in handy in different use cases, so it’s good to

be familiar with them.

Semaphore

The first Python threading object to look at is threading.Semaphore. A Semaphore is a

counter with a few special properties. The first one is that the counting is atomic. This means that

there is a guarantee that the operating system will not swap out the thread in the middle of

incrementing or decrementing the counter.

The internal counter is incremented when you call .release() and decremented when you call

.acquire().

The next special property is that if a thread calls .acquire() when the counter is zero, that

thread will block until a different thread calls .release() and increments the counter to one.

Semaphores are frequently used to protect a resource that has a limited capacity. An example

would be if you have a pool of connections and want to limit the size of that pool to a specific

number.

Timer