String Manipulation with Stringr: Understanding Regular Expressions, Study notes of Advanced Computer Programming

An introduction to string manipulation using the R package 'stringr'. A particular focus is given to regular expressions, which are used to describe patterns in strings. Topics covered include using regular expressions as pattern arguments, character classes, repetition, and shortcuts. The document also includes examples using the NEISS dataset.

Typology: Study notes

2021/2022

Uploaded on 08/05/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(242)

3.2K documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STRING MANIPULATION WITH STRINGR
Regular expressions
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download String Manipulation with Stringr: Understanding Regular Expressions and more Study notes Advanced Computer Programming in PDF only on Docsity!

Regular expressions

● A language for describing pa!erns

● "the start of the string, followed by any single character,

followed by one or more digits"

^.[\d]+

Regular expressions

Regular expressions as a pa!ern argument

str_detect(c("R2-D2", "C-3P0"), pattern = START %R% ANY_CHAR %R% one_or_more(DGT)) [1] TRUE FALSE str_view(c("R2-D2", "C-3P0"), pattern = START %R% ANY_CHAR %R% one_or_more(DGT))

In HTML viewer

Let’s practice!

Regular expression review

Pa!ern Regular Expression rebus

Start of string ^ START

End of string $ END

Any single character . ANY_CHAR

Literal dot, carat or dollar

sign

. ^ $ DOT, CARAT, DOLLAR

Alternation

or("dog", "cat") (?:dog|cat) str_view(c("kittycat", "doggone"), pattern = or("dog", "cat"))

(dog|cat)

Repetition

str_view(c("apple", "Aaron"), pattern = one_or_more("Aa"))

Pa!ern Regular Expression rebus

Optional ? optional()

Zero or more * zero_or_more()

One or more + one_or_more()

Between n and m times {n}{m} repeated()

Let’s practice!

Ranges in character classes

DOLLAR %R% char_class("0123456789") $[0123456789] char_class("0-9") [0-9] char_class("a-z") [a-z] char_class("A-Z") [A-Z]

A digit

A lower case le!er

An upper case le!er

Shortcuts

> DGT

\d

WRD \w SPC \s

A digit

A word character

char_class("0-9")

[0-9]

char_class("a-zA-z0-9_") [a-zA-z0-9_]

A whitespace character

Let’s practice!