Data Science Key Concepts: Chapters 1-5, Exams of Database Programming

A concise overview of key concepts in data science, covering topics from basic definitions to data visualization techniques. It includes explanations of variables, data analytics types, data structures like data frames, and various charting methods. The material is presented in a question-and-answer format, making it easy to understand and review. It also touches on descriptive and inferential statistics, sampling methods, and potential biases in surveys, offering a solid foundation for students and professionals in the field. This resource is particularly useful for those looking to quickly grasp the fundamentals of data science and programming.

Typology: Exams

2025/2026

Available from 10/13/2025

Ollivia-
Ollivia- 🇺🇸

3.5

(2)

9.3K documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Programming for Data Science
Chapters 1-5 rated A
Data -
correct answer
✅Data is information, especially facts or
numbers, usually collected or computed for purposes of analysis.
zettabyte -
correct answer
✅A zettabyte is one sextillion or
10^21 bytes.
Data analytics -
correct answer
✅Data analytics is the field of
analyzing data to gain insight, draw conclusions, or make
decisions.
Big data -
correct answer
✅Big data refers to very large data
sets that cannot be processed by traditional methods, and is
characterized by high volume, rapid velocity of collection, and
variety in type and quality.
Descriptive -
correct answer
✅Descriptive data analytics seeks
to describe data, providing insight and knowledge.
Predictive -
correct answer
✅Predictive data analytics seeks to
make predictions from data.
Prescriptive -
correct answer
✅Prescriptive data analytics seeks
to make decisions (prescriptions) based on data.
variable -
correct answer
✅A variable is an item that can have
different ("varying") values.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Data Science Key Concepts: Chapters 1-5 and more Exams Database Programming in PDF only on Docsity!

Chapters 1-5 rated A

Data - correct answer ✅Data is information, especially facts or numbers, usually collected or computed for purposes of analysis. zettabyte - correct answer ✅A zettabyte is one sextillion or 10^21 bytes. Data analytics - correct answer ✅Data analytics is the field of analyzing data to gain insight, draw conclusions, or make decisions. Big data - correct answer ✅Big data refers to very large data sets that cannot be processed by traditional methods, and is characterized by high volume, rapid velocity of collection, and variety in type and quality. Descriptive - correct answer ✅Descriptive data analytics seeks to describe data, providing insight and knowledge. Predictive - correct answer ✅Predictive data analytics seeks to make predictions from data. Prescriptive - correct answer ✅Prescriptive data analytics seeks to make decisions (prescriptions) based on data. variable - correct answer ✅A variable is an item that can have different ("varying") values.

Chapters 1-5 rated A

quantitative variable - correct answer ✅A quantitative variable can take on a numeric value (quantitative data) that can be measured and ordered. categorical variable - correct answer ✅A categorical variable can take on the value (usually a label) of one of several categories. qualitative variable - correct answer ✅A categorical variable is often called a qualitative variable (known by qualities, rather than quantities). nominal variable - correct answer ✅A nominal variable's categories have no ordering, existing in name only, like apples, oranges, and grapes. ordinal variable - correct answer ✅An ordinal variable's categories have an ordering, like disagree, neutral, and agree. continuous variable - correct answer ✅A continuous variable's values are infinite along a continuum of values within a range, typically real numbers. discrete variable - correct answer ✅A discrete variable's values are finite within a range, typically integers. Data visualization - correct answer ✅Data visualization is the display of data in a format, such as a table or chart, that seeks to achieve a goal of conveying particular information to a viewer.

Chapters 1-5 rated A

columns - correct answer ✅A data frame's columns are the labels of the column data. values - correct answer ✅The data contained in a data frame are also known as values. pandas - correct answer ✅Pandas is a Python library that allows a user to work with data frames by providing tools for reading, writing, subsetting, and reshaping data. attribute - correct answer ✅An attribute is a characteristic of an object. method - correct answer ✅A method is a procedure associated with an object. Subsetting - correct answer ✅Subsetting is the process of retrieving parts of a data frame. long form - correct answer ✅A data frame is in long form when each column is a variable and each row gives non-repeated data. wide form - correct answer ✅A data frame is in wide form if each data variable is in a different column.

Chapters 1-5 rated A

Reshaping data - correct answer ✅Reshaping data involves converting a data frame from one form into the other. Pivoting - correct answer ✅Pivoting converts a data frame from long form to wide form. Melting - correct answer ✅Melting converts a data frame from wide form to long form. bar chart - correct answer ✅A bar chart depicts data values for a categorical variable, using rectangular bars having lengths proportional to category values. category label - correct answer ✅Each listed category has a category label. data labels - correct answer ✅Data values known as data labels can be shown next to the bars, or even inside the bars. column chart - correct answer ✅A column chart is a term used for a vertical bar chart. relative-frequency bar chart - correct answer ✅A relative- frequency bar chart shows each category's portion of the total data, typically as a percentage.

Chapters 1-5 rated A

dependent variable - correct answer ✅The variable that is controlled by an observer or is a reason for variation is the independent variable, while the variable that is then determined based on that variable is the dependent variable. regression curve - correct answer ✅A regression curve is a curve added to a scatter plot that shows the relationship between two variables. strip plot - correct answer ✅A strip plot is a scatter plot where a categorical variable represents an axis and an ordinal variable represents the other. Jittering - correct answer ✅Jittering is the addition of random noise to the plot in order to prevent or minimize overlapping data points. swarm plot - correct answer ✅A swarm plot uses a random algorithm to set a minimum distance between points. line chart - correct answer ✅A line chart (or line graph) depicts data trends by using straight lines to connect successive data points in a scatter plot. line graph - correct answer ✅A line chart (or line graph) depicts data trends by using straight lines to connect successive data points in a scatter plot.

Chapters 1-5 rated A

linear trend line - correct answer ✅A linear trend line is a straight line that depicts the general direction data changes from the first to last data point, often added to summarize the entire chart. Descriptive statistics - correct answer ✅Descriptive statistics focuses on summarizing survey data about a sample drawn from a population. Inferential statistics - correct answer ✅Inferential statistics focuses on using information from the sample to make conclusions about the population from which the sample was drawn. Surveys - correct answer ✅Surveys are conducted to allow statisticians to make generalizations about a population. population - correct answer ✅A population is any collection of objects, people, or things about which statistical inference are made. parameter - correct answer ✅A parameter of a population is a numerical characteristic of a population, such as mean, median, or standard deviation. sampling unit - correct answer ✅A sampling unit is an individual in the population on which a measurement can be taken.

Chapters 1-5 rated A

Voluntary response bias - correct answer ✅Voluntary response bias occurs when a sample is biased toward members that self- select for participation in a survey. Response bias - correct answer ✅Response bias can result if the responses of survey participants are affected by how a question is asked or the behaviors or attitudes of the participant. Acquiescence bias - correct answer ✅Acquiescence bias occurs when respondents tend to agree with a statement in a survey. Extreme responding - correct answer ✅Extreme responding occurs when respondents tend to select the most extreme options available. Social desirability bias - correct answer ✅Social desirability bias occurs when respondents tend to answer questions in a way that is socially accepted by others. simple random sampling - correct answer ✅In simple random sampling, a sample is constructed by random selection from the population. systematic sampling - correct answer ✅In systematic sampling, every kth unit from a population of N units is selected to be in a sample.

Chapters 1-5 rated A

stratified sampling - correct answer ✅In stratified sampling, the population is first divided into groups, or strata, depending on some characteristic. Next, samples within each stratum are randomly selected in a proportional manner. cluster sampling - correct answer ✅In cluster sampling, the population is first divided into groups, or clusters, depending on some characteristic. Next, the sample is constructed by randomly selecting one or more clusters. convenience sampling - correct answer ✅In convenience sampling, units are drawn from a subset of the population that is readily available. arithmetic mean - correct answer ✅A common data summary is the arithmetic mean or mean, which is the sum of the data values in a dataset divided by the number of values in the dataset. mean - correct answer ✅A common data summary is the arithmetic mean or mean, which is the sum of the data values in a dataset divided by the number of values in the dataset. weighted mean - correct answer ✅The weighted mean is a measure of center where some values are counted more than once. median - correct answer ✅The median is the middle value in a sorted dataset.

Chapters 1-5 rated A

range - correct answer ✅The range of a dataset is the difference between the maximum and minimum of the dataset. percentile - correct answer ✅The nth percentile of a dataset is the data value such that n percent of the data falls at or below that value. first quartile - correct answer ✅The first quartile (Q1) is the 25th percentile. One-quarter of the data fall at or below Q1. third quartile - correct answer ✅The third quartile (Q3) is the 75th percentile. Three-quarters of the data fall at or below Q3. five-number summary - correct answer ✅Collectively, the minimum and maximum values, Q1, median, and Q3 form a set of descriptive statistics called the five-number summary. box plot - correct answer ✅A box plot is a data visualization that uses a box and several lines to depict the distribution of data in a dataset. skew - correct answer ✅The skew is the difference between the mean and the median. interquartile range (IQR) - correct answer ✅The interquartile range (IQR) of a dataset is the difference between Q3 and Q (Q3−Q1), or the length of the box in a box plot.

Chapters 1-5 rated A

frequency distribution - correct answer ✅A frequency distribution is a table that displays how often an outcome occurs for a sample. class - correct answer ✅A class is either a value of a categorical variable or an interval of a continuous variable. frequency - correct answer ✅The frequency of a class is the number of events or values that fall under each class. histogram - correct answer ✅A histogram depicts data values by splitting a continuous variable into a number of class intervals, each known as a bin. class intervals - correct answer ✅A histogram depicts data values by splitting a continuous variable into a number of class intervals, each known as a bin. bin - correct answer ✅A histogram depicts data values by splitting a continuous variable into a number of class intervals, each known as a bin. unimodal distribution - correct answer ✅A unimodal distribution occurs when there is one (uni) prevalent peak (mode) in the histogram.

Chapters 1-5 rated A

Probability - correct answer ✅Probability is a measure of how likely an event is to occur. The probability of an event A is denoted P(A), and is the sum of the probabilities of each outcome in the event. Venn diagram - correct answer ✅A sample space and sets of events in the sample space can be represented visually with a Venn diagram. union - correct answer ✅The union of two events A and B is denoted A∪B and is the event that includes outcomes in A or B or both. intersection - correct answer ✅The intersection of two events A and B is denoted A∩B and is the event consisting of outcomes that are in both A and B. complement - correct answer ✅The complement of an event A is denoted A¯¯¯¯ and is the event consisting of outcomes that are not in A. empty set - correct answer ✅The empty set is the event consisting of no outcomes. mutually exclusive - correct answer ✅Two events A and B are mutually exclusive if A∩B=∅, that is, A and B have no outcomes in common.

Chapters 1-5 rated A

complement rule - correct answer ✅The complement rule relates the probability of an event to the probability of the complement of the event. addition rule - correct answer ✅The addition rule generalizes axiom 3 to events that are not mutually exclusive. independent - correct answer ✅Two events are independent if the probability of one event does not affect the probability of the other. multiplication rule - correct answer ✅The multiplication rule gives the probability of 2 independent events happening together. conditional probability - correct answer ✅The conditional probability of an event A given event B has occurred is denoted as P(A|B) and is equal to P(A∩B)P(B). law of total probability - correct answer ✅The law of total probability states that if the sample space is partitioned into two or more mutually exclusive subevents, the probability of an event B can be expressed in terms of conditional probabilities given each of the subevents. Bayes' Theorem - correct answer ✅Bayes' Theorem relates the probability of an event A given a condition B to the probability of the condition B given that the event A occurred. That is, Bayes' Theorem allows P(B|A) to be calculated from P(A|B).

Chapters 1-5 rated A

discrete random variable - correct answer ✅A discrete random variable can take on a countable number of distinct values like the integers between 0 and 100. continuous random variable - correct answer ✅A continuous random variable can take on any value within a range of values like the real numbers between 0 and 1. probability mass function (pmf) - correct answer ✅A probability mass function (pmf) assigns the probability that a discrete random variable is exactly equal to some value (typically depicted as a table, plot, or equation). cumulative distribution function (cdf) - correct answer ✅The cumulative distribution function (cdf) of a discrete random variable is the probability that for any number x, the observed value of the random variable will be at most x or p(X≤x). mean - correct answer ✅The mean or expected value μ of a discrete random variable X is the sum of the possible values of X multiplied by the probability of the value. expected value - correct answer ✅The mean or expected value μ of a discrete random variable X is the sum of the possible values of X multiplied by the probability of the value. variance - correct answer ✅The variance of a discrete random variable X is a measure of the spread of a distribution.

Chapters 1-5 rated A

standard deviation - correct answer ✅The standard deviation is the square root of the variance. binomial distribution - correct answer ✅A binomial distribution is a discrete random variable distribution with two possible values that have fixed probabilities that add up to 1. Bernoulli distribution - correct answer ✅The Bernoulli distribution is the special case of a binomial distribution where n=1. The Bernoulli distribution is the probability distribution of a single trial with two possible outcomes. Poisson distribution - correct answer ✅The Poisson distribution gives the probability of k independent, randomly occurring events happening over a period or area where λ events happen on average. probability density function (pdf) - correct answer ✅A probability density function (pdf) describes the relative likelihood of all values for a continuous random variable. cumulative distribution function (cdf) - correct answer ✅A cumulative distribution function (cdf) of a continuous random variable is the probability that for any number x, the observed value of the random variable will be at most x or p(X≤x). mean - correct answer ✅The mean μ or expected value E(X) of a continuous random variable X is a measure of the center of the distribution.