Prepare-se para as provas
Obter pontos
Guias e Dicas
Venda na Docsity
Docsity I.A.
ENEM

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Guias e Dicas

Venda na Docsity

Docsity I.A.

Entrar Cadastre-se

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Encontrar documentos

Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity

Encontra documentos específicos para os exames da tua universidade

Videoaulas

Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade

Quiz

Responda perguntas de provas passadas e avalie sua preparação.

Docsity AINEW

Resuma seus documentos, faça perguntas, converta-os em questionários e mapas conceituais

TCC e ENEM 2026

Estude com provas passadas, TCCs e dicas úteis

Explorar perguntas

Tire suas dúvidas lendo as respostas dadas por outros alunos como você.

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Compartilhe documentos

20 Pontos

Por cada documento compartilhado

Responda às perguntas

5 Pontos

por cada resposta enviada (máx. 1 por dia)

Todas as maneiras de obter pontos grátis

Ganhe pontos imediatamente

Escolha um Plano Premium com todos os pontos que precisa

Oportunidades de estudo

Escolha seu próximo programa de estudos

Entre em contato direto com as melhores Universidades do mundo. Pesquise entre milhares de Universidades e parceiros oficiais

Comunidade

Pergunte à comunidade

Peça ajuda à comunidade e tire suas dúvidas relacionadas ao estudo

Guias grátis

Os eBooks que salvam estudantes!

Baixe gratuitamente nossos guias de estudo, métodos para diminuir a ansiedade, dicas de TCC preparadas pelos professores da Docsity

Introdução às Matrizes NumPy: Arrays e Cálculos Vectorizados, Notas de estudo de Estatística Aplicada

Instituto Federal de Educação, Ciência e Tecnologia da Paraíba (IFPB)Estatística Aplicada

Aprenda a trabalhar com arrays numpy, uma biblioteca de cálculos matemáticos e vectorizados em python. Saiba como criar, manipular e realizar operações matemáticas em arrays multidimensionais usando o numpy. Este capítulo aborda os conceitos básicos de arrays numpy.

Tipologia: Notas de estudo

2021

Compartilhado em 11/03/2022

alexandre-dos-santos-oliveira 🇧🇷

4 documentos

1 / 19

Esta página não é visível na pré-visualização

Não perca as partes importantes!

CHAPTER 4

NumPy Basics: Arrays and

Vectorized Computation

NumPy, short for Numerical Python, is one of the most important foundational pack‐

ages for numerical computing in Python. Most computational packages providing

scientific functionality use NumPy’s array objects as the lingua franca for data

exchange.

Here are some of the things you’ll find in NumPy:

•ndarray, an efficient multidimensional array providing fast array-oriented arith‐

metic operations and flexible broadcasting capabilities.

•Mathematical functions for fast operations on entire arrays of data without hav‐

ing to write loops.

•Tools for reading/writing array data to disk and working with memory-mapped

files.

• Linear algebra, random number generation, and Fourier transform capabilities.

• A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Because NumPy provides an easy-to-use C API, it is straightforward to pass data to

external libraries written in a low-level language and also for external libraries to

return data to Python as NumPy arrays. This feature has made Python a language of

choice for wrapping legacy C/C++/Fortran codebases and giving them a dynamic and

easy-to-use interface.

While NumPy by itself does not provide modeling or scientific functionality, having

an understanding of NumPy arrays and array-oriented computing will help you use

tools with array-oriented semantics, like pandas, much more effectively. Since

85

Descubra Notas de estudo de Estatística Aplicada Instituto Federal de Educação, Ciência e Tecnologia da Paraíba (IFPB)

Documentos relacionados

Numpy e Álgebra linear

Introdução ao Numpy e Pandas

Matrices breve introduccion con libreria Numpy

(1)

Arrays e Matrizes em Pascal: Tipos, Declaração e Operações

Arrays

(1)

Linguagem de Programação: Vetores e Matrizes

(cap7) - Programação em C - Matrizes e Strings

Self-Assembled Plasmonic Nanohole Arrays

Busca em Arrays: Busca Linear, Busca Binária e Árvores Binárias de Pesquisa

Ficha listas e arrays

Passagem de Arrays a Funções

Arrays e Sessões em PHP: Guia Prático

Pré-visualização parcial do texto

Baixe Introdução às Matrizes NumPy: Arrays e Cálculos Vectorizados e outras Notas de estudo em PDF para Estatística Aplicada, somente na Docsity!

CHAPTER 4 NumPy Basics: Arrays and

Vectorized Computation

NumPy, short for Numerical Python, is one of the most important foundational pack‐

ages for numerical computing in Python. Most computational packages providing

scientific functionality use NumPy’s array objects as the lingua franca for data

exchange.

Here are some of the things you’ll find in NumPy:

ndarray, an efficient multidimensional array providing fast array-oriented arith‐

metic operations and flexible broadcasting capabilities.

Mathematical functions for fast operations on entire arrays of data without hav‐

ing to write loops.

Tools for reading/writing array data to disk and working with memory-mapped

files.

Linear algebra, random number generation, and Fourier transform capabilities.
A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Because NumPy provides an easy-to-use C API, it is straightforward to pass data to

external libraries written in a low-level language and also for external libraries to

return data to Python as NumPy arrays. This feature has made Python a language of

choice for wrapping legacy C/C++/Fortran codebases and giving them a dynamic and

easy-to-use interface.

While NumPy by itself does not provide modeling or scientific functionality, having

an understanding of NumPy arrays and array-oriented computing will help you use

tools with array-oriented semantics, like pandas, much more effectively. Since

NumPy is a large topic, I will cover many advanced NumPy features like broadcasting

in more depth later (see Appendix A).

For most data analysis applications, the main areas of functionality I’ll focus on are:

Fast vectorized array operations for data munging and cleaning, subsetting and

filtering, transformation, and any other kinds of computations

Common array algorithms like sorting, unique, and set operations
Efficient descriptive statistics and aggregating/summarizing data
Data alignment and relational data manipulations for merging and joining

together heterogeneous datasets

Expressing conditional logic as array expressions instead of loops with if-elif-

else branches

Group-wise data manipulations (aggregation, transformation, function applica‐

tion)

While NumPy provides a computational foundation for general numerical data pro‐

cessing, many readers will want to use pandas as the basis for most kinds of statistics

or analytics, especially on tabular data. pandas also provides some more domain-

specific functionality like time series manipulation, which is not present in NumPy.

Array-oriented computing in Python traces its roots back to 1995,

when Jim Hugunin created the Numeric library. Over the next 10

years, many scientific programming communities began doing

array programming in Python, but the library ecosystem had

become fragmented in the early 2000s. In 2005, Travis Oliphant

was able to forge the NumPy project from the then Numeric and

Numarray projects to bring the community together around a sin‐

gle array computing framework.

One of the reasons NumPy is so important for numerical computations in Python is

because it is designed for efficiency on large arrays of data. There are a number of

reasons for this:

NumPy internally stores data in a contiguous block of memory, independent of

other built-in Python objects. NumPy’s library of algorithms written in the C lan‐

guage can operate on this memory without any type checking or other overhead.

NumPy arrays also use much less memory than built-in Python sequences.

NumPy operations perform complex computations on entire arrays without the

need for Python for loops.

86 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

array([[-0.4094, 0.9579, - 1.0389], [-1.1115, 3.9316, 2.7868]])

In the first example, all of the elements have been multiplied by 10. In the second, the

corresponding values in each “cell” in the array have been added to each other.

In this chapter and throughout the book, I use the standard

NumPy convention of always using import numpy as np. You are,

of course, welcome to put from numpy import * in your code to

avoid having to write np., but I advise against making a habit of

this. The numpy namespace is large and contains a number of func‐

tions whose names conflict with built-in Python functions (like min

and max).

An ndarray is a generic multidimensional container for homogeneous data; that is, all

of the elements must be the same type. Every array has a shape, a tuple indicating the

size of each dimension, and a dtype, an object describing the data type of the array:

In [ 17 ]: data.shape Out[ 17 ]: ( 2 , 3 )

In [ 18 ]: data.dtype Out[ 18 ]: dtype('float64')

This chapter will introduce you to the basics of using NumPy arrays, and should be

sufficient for following along with the rest of the book. While it’s not necessary to

have a deep understanding of NumPy for many data analytical applications, becom‐

ing proficient in array-oriented programming and thinking is a key step along the

way to becoming a scientific Python guru.

Whenever you see “array,” “NumPy array,” or “ndarray” in the text,

with few exceptions they all refer to the same thing: the ndarray

object.

Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any

sequence-like object (including other arrays) and produces a new NumPy array con‐

taining the passed data. For example, a list is a good candidate for conversion:

In [ 19 ]: data1 = [ 6 , 7.5, 8 , 0 , 1 ]

In [ 20 ]: arr1 = np.array(data1)

In [ 21 ]: arr Out[ 21 ]: array([ 6. , 7.5, 8. , 0. , 1. ])

88 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐

sional array:

In [ 22 ]: data2 = [[ 1 , 2 , 3 , 4 ], [ 5 , 6 , 7 , 8 ]]

In [ 23 ]: arr2 = np.array(data2)

In [ 24 ]: arr Out[ 24 ]: array([[ 1 , 2 , 3 , 4 ], [ 5 , 6 , 7 , 8 ]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape

inferred from the data. We can confirm this by inspecting the ndim and shape

attributes:

In [ 25 ]: arr2.ndim Out[ 25 ]: 2

In [ 26 ]: arr2.shape Out[ 26 ]: ( 2 , 4 )

Unless explicitly specified (more on this later), np.array tries to infer a good data

type for the array that it creates. The data type is stored in a special dtype metadata

object; for example, in the previous two examples we have:

In [ 27 ]: arr1.dtype Out[ 27 ]: dtype('float64')

In [ 28 ]: arr2.dtype Out[ 28 ]: dtype('int64')

In addition to np.array, there are a number of other functions for creating new

arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a

given length or shape. empty creates an array without initializing its values to any par‐

ticular value. To create a higher dimensional array with these methods, pass a tuple

for the shape:

In [ 29 ]: np.zeros( 10 ) Out[ 29 ]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [ 30 ]: np.zeros(( 3 , 6 )) Out[ 30 ]: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.]])

In [ 31 ]: np.empty(( 2 , 3 , 2 )) Out[ 31 ]: array([[[ 0., 0.], [ 0., 0.], [ 0., 0.]],

4.1 The NumPy ndarray: A Multidimensional Array Object | 89

In [ 36 ]: arr2.dtype Out[ 36 ]: dtype('int32')

dtypes are a source of NumPy’s flexibility for interacting with data coming from other

systems. In most cases they provide a mapping directly onto an underlying disk or

memory representation, which makes it easy to read and write binary streams of data

to disk and also to connect to code written in a low-level language like C or Fortran.

The numerical dtypes are named the same way: a type name, like float or int, fol‐

lowed by a number indicating the number of bits per element. A standard double-

precision floating-point value (what’s used under the hood in Python’s float object)

takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64. See

Table 4-2 for a full listing of NumPy’s supported data types.

Don’t worry about memorizing the NumPy dtypes, especially if

you’re a new user. It’s often only necessary to care about the general

kind of data you’re dealing with, whether floating point, complex,

integer, boolean, string, or general Python object. When you need

more control over how data are stored in memory and on disk,

especially large datasets, it is good to know that you have control

over the storage type.

Table 4-2. NumPy data types

Type Type code Description int8, uint8 i1, u1 Signed and unsigned 8-bit (1 byte) integer types int16, uint16 i2, u2 Signed and unsigned 16-bit integer types int32, uint32 i4, u4 Signed and unsigned 32-bit integer types int64, uint64 i8, u8 Signed and unsigned 64-bit integer types float16 f2 Half-precision loating point float32 f4 or f Standard single-precision loating point; compatible with C loat float64 f8 or d Standard double-precision loating point; compatible with C double and Python float object float128 f16 or g Extended-precision loating point complex64, complex128, complex

c8, c16, c

Complex numbers represented by two 32, 64, or 128 loats, respectively

bool? Boolean type storing True and False values object O Python object type; a value can be any Python object string_ S Fixed-length ASCII string type (1 byte per character); for example, to create a string dtype with length 10, use 'S10' unicode_ U Fixed-length Unicode type (number of bytes platform speciic); same speciication semantics as string_ (e.g., 'U10')

4.1 The NumPy ndarray: A Multidimensional Array Object | 91

You can explicitly convert or cast an array from one dtype to another using ndarray’s

astype method:

In [ 37 ]: arr = np.array([ 1 , 2 , 3 , 4 , 5 ])

In [ 38 ]: arr.dtype Out[ 38 ]: dtype('int64')

In [ 39 ]: float_arr = arr.astype(np.float64)

In [ 40 ]: float_arr.dtype Out[ 40 ]: dtype('float64')

In this example, integers were cast to floating point. If I cast some floating-point

numbers to be of integer dtype, the decimal part will be truncated:

In [ 41 ]: arr = np.array([3.7, - 1.2, - 2.6, 0.5, 12.9, 10.1])

In [ 42 ]: arr Out[ 42 ]: array([ 3.7, - 1.2, - 2.6, 0.5, 12.9, 10.1])

In [ 43 ]: arr.astype(np.int32) Out[ 43 ]: array([ 3 , - 1 , - 2 , 0 , 12 , 10 ], dtype=int32)

If you have an array of strings representing numbers, you can use astype to convert

them to numeric form:

In [ 44 ]: numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

In [ 45 ]: numeric_strings.astype(float) Out[ 45 ]: array([ 1.25, - 9.6 , 42. ])

It’s important to be cautious when using the numpy.string_ type,

as string data in NumPy is fixed size and may truncate input

without warning. pandas has more intuitive out-of-the-box behav‐

ior on non-numeric data.

If casting were to fail for some reason (like a string that cannot be converted to

float64), a ValueError will be raised. Here I was a bit lazy and wrote float instead

of np.float64; NumPy aliases the Python types to its own equivalent data dtypes.

You can also use another array’s dtype attribute:

In [ 46 ]: int_array = np.arange( 10 )

In [ 47 ]: calibers = np.array([. 22 ,. 270 ,. 357 ,. 380 ,. 44 ,. 50 ], dtype=np.float64)

In [ 48 ]: int_array.astype(calibers.dtype) Out[ 48 ]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

92 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

In [ 57 ]: arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [ 58 ]: arr Out[ 58 ]: array([[ 0., 4., 1.], [ 7., 2., 12.]])

In [ 59 ]: arr2 > arr Out[ 59 ]: array([[False, True, False], [ True, False, True]], dtype=bool)

Operations between differently sized arrays is called broadcasting and will be dis‐

cussed in more detail in Appendix A. Having a deep understanding of broadcasting is

not necessary for most of this book.

Basic Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select

a subset of your data or individual elements. One-dimensional arrays are simple; on

the surface they act similarly to Python lists:

In [ 60 ]: arr = np.arange( 10 )

In [ 61 ]: arr Out[ 61 ]: array([ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ])

In [ 62 ]: arr[ 5 ] Out[ 62 ]: 5

In [ 63 ]: arr[ 5 : 8 ] Out[ 63 ]: array([ 5 , 6 , 7 ])

In [ 64 ]: arr[ 5 : 8 ] = 12

In [ 65 ]: arr Out[ 65 ]: array([ 0 , 1 , 2 , 3 , 4 , 12 , 12 , 12 , 8 , 9 ])

As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is

propagated (or broadcasted henceforth) to the entire selection. An important first dis‐

tinction from Python’s built-in lists is that array slices are views on the original array.

This means that the data is not copied, and any modifications to the view will be

reflected in the source array.

To give an example of this, I first create a slice of arr:

In [ 66 ]: arr_slice = arr[ 5 : 8 ]

In [ 67 ]: arr_slice Out[ 67 ]: array([ 12 , 12 , 12 ])

94 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

Now, when I change values in arr_slice, the mutations are reflected in the original

array arr:

In [ 68 ]: arr_slice[ 1 ] = 12345

In [ 69 ]: arr Out[ 69 ]: array([ 0 , 1 , 2 , 3 , 4 , 12 , 12345 , 12 , 8 , 9 ])

The “bare” slice [:] will assign to all values in an array:

In [ 70 ]: arr_slice[:] = 64

In [ 71 ]: arr Out[ 71 ]: array([ 0 , 1 , 2 , 3 , 4 , 64 , 64 , 64 , 8 , 9 ])

If you are new to NumPy, you might be surprised by this, especially if you have used

other array programming languages that copy data more eagerly. As NumPy has been

designed to be able to work with very large arrays, you could imagine performance

and memory problems if NumPy insisted on always copying data.

If you want a copy of a slice of an ndarray instead of a view, you

will need to explicitly copy the array—for example,

arr[5:8].copy().

With higher dimensional arrays, you have many more options. In a two-dimensional

array, the elements at each index are no longer scalars but rather one-dimensional

arrays:

In [ 72 ]: arr2d = np.array([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]])

In [ 73 ]: arr2d[ 2 ] Out[ 73 ]: array([ 7 , 8 , 9 ])

Thus, individual elements can be accessed recursively. But that is a bit too much

work, so you can pass a comma-separated list of indices to select individual elements.

So these are equivalent:

In [ 74 ]: arr2d[ 0 ][ 2 ] Out[ 74 ]: 3

In [ 75 ]: arr2d[ 0 , 2 ] Out[ 75 ]: 3

See Figure 4-1 for an illustration of indexing on a two-dimensional array. I find it

helpful to think of axis 0 as the “rows” of the array and axis 1 as the “columns.”

4.1 The NumPy ndarray: A Multidimensional Array Object | 95

In [ 83 ]: arr3d Out[ 83 ]: array([[[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]], [[ 7 , 8 , 9 ], [ 10 , 11 , 12 ]]])

Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0),

forming a 1-dimensional array:

In [ 84 ]: arr3d[ 1 , 0 ] Out[ 84 ]: array([ 7 , 8 , 9 ])

This expression is the same as though we had indexed in two steps:

In [ 85 ]: x = arr3d[ 1 ]

In [ 86 ]: x Out[ 86 ]: array([[ 7 , 8 , 9 ], [ 10 , 11 , 12 ]])

In [ 87 ]: x[ 0 ] Out[ 87 ]: array([ 7 , 8 , 9 ])

Note that in all of these cases where subsections of the array have been selected, the

returned arrays are views.

Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the

familiar syntax:

In [ 88 ]: arr Out[ 88 ]: array([ 0 , 1 , 2 , 3 , 4 , 64 , 64 , 64 , 8 , 9 ])

In [ 89 ]: arr[ 1 : 6 ] Out[ 89 ]: array([ 1 , 2 , 3 , 4 , 64 ])

Consider the two-dimensional array from before, arr2d. Slicing this array is a bit

different:

In [ 90 ]: arr2d Out[ 90 ]: array([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]])

In [ 91 ]: arr2d[: 2 ] Out[ 91 ]: array([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]])

4.1 The NumPy ndarray: A Multidimensional Array Object | 97

As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a

range of elements along an axis. It can be helpful to read the expression arr2d[:2] as

“select the first two rows of arr2d.”

You can pass multiple slices just like you can pass multiple indexes:

In [ 92 ]: arr2d[: 2 , 1 :] Out[ 92 ]: array([[ 2 , 3 ], [ 5 , 6 ]])

When slicing like this, you always obtain array views of the same number of dimen‐

sions. By mixing integer indexes and slices, you get lower dimensional slices.

For example, I can select the second row but only the first two columns like so:

In [ 93 ]: arr2d[ 1 , : 2 ] Out[ 93 ]: array([ 4 , 5 ])

Similarly, I can select the third column but only the first two rows like so:

In [ 94 ]: arr2d[: 2 , 2 ] Out[ 94 ]: array([ 3 , 6 ])

See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire

axis, so you can slice only higher dimensional axes by doing:

In [ 95 ]: arr2d[:, : 1 ] Out[ 95 ]: array([[ 1 ], [ 4 ], [ 7 ]])

Of course, assigning to a slice expression assigns to the whole selection:

In [ 96 ]: arr2d[: 2 , 1 :] = 0

In [ 97 ]: arr2d Out[ 97 ]: array([[ 1 , 0 , 0 ], [ 4 , 0 , 0 ], [ 7 , 8 , 9 ]])

98 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

[ 0.3026, 0.5238, 0.0009, 1.3438],

[-0.7135, - 0.8312, - 2.3702, - 1.8608]])

Suppose each name corresponds to a row in the data array and we wanted to select

all the rows with corresponding name 'Bob'. Like arithmetic operations, compari‐

sons (such as ==) with arrays are also vectorized. Thus, comparing names with the

string 'Bob' yields a boolean array:

In [ 102 ]: names == 'Bob' Out[ 102 ]: array([ True, False, False, True, False, False, False], dtype=bool)

This boolean array can be passed when indexing the array:

In [ 103 ]: data[names == 'Bob'] Out[ 103 ]: array([[ 0.0929, 0.2817, 0.769 , 1.2464], [ 1.669 , - 0.4386, - 0.5397, 0.477 ]])

The boolean array must be of the same length as the array axis it’s indexing. You can

even mix and match boolean arrays with slices or integers (or sequences of integers;

more on this later).

Boolean selection will not fail if the boolean array is not the correct

length, so I recommend care when using this feature.

In these examples, I select from the rows where names == 'Bob' and index the col‐

umns, too:

In [ 104 ]: data[names == 'Bob', 2 :] Out[ 104 ]: array([[ 0.769 , 1.2464], [-0.5397, 0.477 ]])

In [ 105 ]: data[names == 'Bob', 3 ] Out[ 105 ]: array([ 1.2464, 0.477 ])

To select everything but 'Bob', you can either use != or negate the condition using ~:

In [ 106 ]: names != 'Bob' Out[ 106 ]: array([False, True, True, False, True, True, True], dtype=bool)

In [ 107 ]: data[~(names == 'Bob')] Out[ 107 ]: array([[ 1.0072, - 1.2962, 0.275 , 0.2289], [ 1.3529, 0.8864, - 2.0016, - 0.3718], [ 3.2489, - 1.0212, - 0.5771, 0.1241], [ 0.3026, 0.5238, 0.0009, 1.3438], [-0.7135, - 0.8312, - 2.3702, - 1.8608]])

100 | Chapter 4: NumPy Basics: Arrays and Vectorized Computation

The ~ operator can be useful when you want to invert a general condition:

In [ 108 ]: cond = names == 'Bob'

In [ 109 ]: data[~cond] Out[ 109 ]: array([[ 1.0072, - 1.2962, 0.275 , 0.2289], [ 1.3529, 0.8864, - 2.0016, - 0.3718], [ 3.2489, - 1.0212, - 0.5771, 0.1241], [ 0.3026, 0.5238, 0.0009, 1.3438], [-0.7135, - 0.8312, - 2.3702, - 1.8608]])

Selecting two of the three names to combine multiple boolean conditions, use

boolean arithmetic operators like & (and) and | (or):

In [ 110 ]: mask = (names == 'Bob') | (names == 'Will')

In [ 111 ]: mask Out[ 111 ]: array([ True, False, True, True, True, False, False], dtype=bool)

In [ 112 ]: data[mask] Out[ 112 ]: array([[ 0.0929, 0.2817, 0.769 , 1.2464], [ 1.3529, 0.8864, - 2.0016, - 0.3718], [ 1.669 , - 0.4386, - 0.5397, 0.477 ], [ 3.2489, - 1.0212, - 0.5771, 0.1241]])

Selecting data from an array by boolean indexing always creates a copy of the data,

even if the returned array is unchanged.

The Python keywords and and or do not work with boolean arrays.

Use & (and) and | (or) instead.

Setting values with boolean arrays works in a common-sense way. To set all of the

negative values in data to 0 we need only do:

In [ 113 ]: data[data < 0 ] = 0

In [ 114 ]: data Out[ 114 ]: array([[ 0.0929, 0.2817, 0.769 , 1.2464], [ 1.0072, 0. , 0.275 , 0.2289], [ 1.3529, 0.8864, 0. , 0. ], [ 1.669 , 0. , 0. , 0.477 ], [ 3.2489, 0. , 0. , 0.1241], [ 0.3026, 0.5238, 0.0009, 1.3438], [ 0. , 0. , 0. , 0. ]])

4.1 The NumPy ndarray: A Multidimensional Array Object | 101

array([[ 5., 5., 5., 5.], [ 3., 3., 3., 3.], [ 1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a one-

dimensional array of elements corresponding to each tuple of indices:

In [ 122 ]: arr = np.arange( 32 ).reshape(( 8 , 4 ))

In [ 123 ]: arr Out[ 123 ]: array([[ 0 , 1 , 2 , 3 ], [ 4 , 5 , 6 , 7 ], [ 8 , 9 , 10 , 11 ], [ 12 , 13 , 14 , 15 ], [ 16 , 17 , 18 , 19 ], [ 20 , 21 , 22 , 23 ], [ 24 , 25 , 26 , 27 ], [ 28 , 29 , 30 , 31 ]])

In [ 124 ]: arr[[ 1 , 5 , 7 , 2 ], [ 0 , 3 , 1 , 2 ]] Out[ 124 ]: array([ 4 , 23 , 29 , 10 ])

We’ll look at the reshape method in more detail in Appendix A.

Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of

how many dimensions the array has (here, only 2), the result of fancy indexing is

always one-dimensional.

The behavior of fancy indexing in this case is a bit different from what some users

might have expected (myself included), which is the rectangular region formed by

selecting a subset of the matrix’s rows and columns. Here is one way to get that:

In [ 125 ]: arr[[ 1 , 5 , 7 , 2 ]][:, [ 0 , 3 , 1 , 2 ]] Out[ 125 ]: array([[ 4 , 7 , 5 , 6 ], [ 20 , 23 , 21 , 22 ], [ 28 , 31 , 29 , 30 ], [ 8 , 11 , 9 , 10 ]])

Keep in mind that fancy indexing, unlike slicing, always copies the data into a new

array.

Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the under‐

lying data without copying anything. Arrays have the transpose method and also the

special T attribute:

In [ 126 ]: arr = np.arange( 15 ).reshape(( 3 , 5 ))

In [ 127 ]: arr

4.1 The NumPy ndarray: A Multidimensional Array Object | 103