Data Visualization Exam Questions and Answers: A Comprehensive Review, Exams of Nursing

A series of questions and answers related to data visualization, covering topics such as the benefits of data visualization, differences between data processing and querying, data challenges (ins, 3vs, hmle), data schemas, advantages of structured and semi-structured databases, the curse of dimensionality, data transformation, scalable data exploratory systems, prefix and subsequence searches, edit distance, skyline queries, visual variable data types (nominal, ordinal, interval, ratio), the visualization pipeline, bertin's visual variables, color schemes, data skewness, data deluge, schema, dynamic visualization, vector representation, pie charts, line charts, heckbert's labeling algorithm, quantiles, box-and-whisker plots, sturge's formula, outliers, multivariate data sets, and scagnostics. It serves as a study guide or exam preparation material for students in data visualization courses, offering concise explanations and examples for key concepts.

Typology: Exams

2024/2025

Available from 06/29/2025

eric-studyguide
eric-studyguide 🇺🇸

1K documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 | P a g e
CSE 578 DATA VISUALIZATION NEWEST MIDTEM EXAM |
(LATEST EXAM 2025) QUESTIONS AND CORRECT
ANSWERS | ALREADY GRADED A+ | VERIFIED ANSWERS
Why is data visualization helpful? CORRECT ANSWER √√>>>1. amplifies cognition
2. expands working memory
3. reduces search time
4. improves pattern detection
5. controls attention
Describe the difference between data processing and querying CORRECT ANSWER √√>>>In
both instances, the user knows what they want. The difference is that with querying, they can
only describe it whereas in data processing they actually have a way to compute it.
Describe the difference between data exploration and navigation CORRECT ANSWER √√>>>In
exploration, the user does not know what they want but wants to get an idea about the data. In
navigation, the user DOES know what they want but does not know how to describe/locate it.
What do these acronyms describe? INS, 3Vs, HMLE CORRECT ANSWER √√>>>Data challenges
What does INS stand for and mean? CORRECT ANSWER √√>>>INS is a lit of data challenges:
Imprecision, Noise, Sparsity
What does 3Vs stand for and mean? CORRECT ANSWER √√>>>3Vs is a list of data challenges:
Volume, Velocity, Variety
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Data Visualization Exam Questions and Answers: A Comprehensive Review and more Exams Nursing in PDF only on Docsity!

CSE 578 DATA VISUALIZATION NEWEST MIDTEM EXAM |

(LATEST EXAM 2025) QUESTIONS AND CORRECT

ANSWERS | ALREADY GRADED A+ | VERIFIED ANSWERS

Why is data visualization helpful? CORRECT ANSWER √√ >>>1. amplifies cognition

  1. expands working memory
  2. reduces search time
  3. improves pattern detection
  4. controls attention Describe the difference between data processing and querying CORRECT ANSWER √√ >>>In both instances, the user knows what they want. The difference is that with querying, they can only describe it whereas in data processing they actually have a way to compute it. Describe the difference between data exploration and navigation CORRECT ANSWER √√ >>>In exploration, the user does not know what they want but wants to get an idea about the data. In navigation, the user DOES know what they want but does not know how to describe/locate it. What do these acronyms describe? INS, 3Vs, HMLE CORRECT ANSWER √√ >>>Data challenges What does INS stand for and mean? CORRECT ANSWER √√ >>>INS is a lit of data challenges: Imprecision, Noise, Sparsity What does 3Vs stand for and mean? CORRECT ANSWER √√ >>>3Vs is a list of data challenges: Volume, Velocity, Variety

What does HMLE stand for and mean? CORRECT ANSWER √√ >>>HMLE is a list of data challenges: High-dimensional, Multi-modal, Inter-Linked, Evolving What is a data schema? CORRECT ANSWER √√ >>>A set of constraints that...

  1. describe the "properties" of data
  2. describe the structure of data
  3. enable validation & efficient storage of data
  4. enable querying and retrieval of data Advantages of a structured database? CORRECT ANSWER √√ >>>1. Easier to query
  5. Easier to optimize
  6. Easier to explore Advantages of semi-structured database? CORRECT ANSWER √√ >>>Data organization is flexible/malleable (easier to integrate and exchange). Describe the curse of dimensionality CORRECT ANSWER √√ >>>The more dimensions we have, the more data we need to discover patterns (prevent overfitting). True or False: The distance between two points is equal to the length of the distance vector. CORRECT ANSWER √√ >>>True Give an example of data transformation CORRECT ANSWER √√ >>>Gender column originally being a single column with "M" or "F", and then is transformed into two columns (one for M and one for F) with 0s and 1s to confirm sex. Which aspects of data should be handled by a scalable data exploratory system? CORRECT

True or False: For nominal data, order matters. CORRECT ANSWER √√ >>>False. Nominal data is data whose categories have no implied ordering. Give an example of ordinal data CORRECT ANSWER √√ >>>Small, Medium, Large What is ordinal data? CORRECT ANSWER √√ >>>Data that has a specified order, but no specified distance metric. What is interval data? CORRECT ANSWER √√ >>>Data that has measurable distances (such as temperature). What is ratio data? CORRECT ANSWER √√ >>>Same as interval data, but includes a zero point (measuring tape). In which step of the visualization pipeline is data prepared for the visualization (smooth, interpolate, transform)? CORRECT ANSWER √√ >>>Data analysis In which step of the visualization pipeline is a subset of the data selected for visualization? CORRECT ANSWER √√ >>>Filtering In which step of the visualization pipeline are data mapped to geometric primitives and their attributes? CORRECT ANSWER √√ >>>Mapping In which step of the visualization pipeline is geometric data transformed to image data? CORRECT ANSWER √√ >>>Rendering According to Bertin's Visual Variables, what is the best way to represent a quantitative dimension visually? CORRECT ANSWER √√ >>>Position

True or False: Area and volume are among the best attributes to use for graphing data. CORRECT ANSWER √√ >>>False. They are among the WORST attributes. True or False: Objects in the skyline are not dominated by any other objects in the database. CORRECT ANSWER √√ >>>True Which data type is best suited for a rainbow color scheme? CORRECT ANSWER √√ >>>Nominal because no ordering is implied. Qualitative color scheme is a univariate or multivariate color scheme? CORRECT ANSWER √√ >>>Univariate Sequential color schemes are best suited for which type of data? CORRECT ANSWER √√ >>>Ordered data A divergent color scheme is a univariate color scheme that is best suited for which type of data? CORRECT ANSWER √√ >>>Ratio data where there is some meaningful zero point In a positive skew, the curve slopes down toward which direction? CORRECT ANSWER √√ >>>Slope is down from left to right In a negative skew, the curve slopes down toward which direction CORRECT ANSWER √√ >>>Slope is down from right to left How many visual variables did Bertin identify? How many can you name? CORRECT ANSWER √√ >>>7 and they are: position, size, value, color, texture, orientation, and shape.

Which type of chart is best for changes over time? CORRECT ANSWER √√ >>>Line chart Heckbert's Labeling Algorithm addresses the problem: for small numbers, the range of labels can be much larger than the data range. How is it addressed? CORRECT ANSWER √√ >>>Solution is to drop labels which overlap or fall outside the data range. This leads to unevenly spaced labels or axes with only one label. ____________ are points taken at regular intervals from the cumulative distribution function of a random variable. CORRECT ANSWER √√ >>>Quantiles What are the 5 main measures in a box-and-whisker plot? CORRECT ANSWER √√ >>>Lower extreme, lower quartile, median, upper quartile, upper extreme When a histogram has a tail that goes to the right, which way is it skewed? CORRECT ANSWER √√ >>>Right skewed (AKA positive skew) Using Sturge's formula, how many bins should there be for a dataset of 10 points? CORRECT ANSWER √√ >>>Sturge's formula is K=1+log (base 2) N where K is the number of class intervals (bins), and N is the number of observations. This formula is useful when we want to make the data fit a normal distribution pattern. In our case, it would be 5 bins. The first step in finding quantiles of a dataset is to? CORRECT ANSWER √√ >>>Sort the data 19 23 26 30 33 35 38 38 40 42 45 45 47 56 What is the value of the first quartile in this dataset? CORRECT ANSWER √√ >>>

How do you calculate the maximum number of data points that falls below the third quartile? CORRECT ANSWER √√ >>>Divide the number of data points by the number of quantiles, and multiply by the quantile number. Example: Suppose that a dataset containing 36 data points is divided into 9 quantiles. 36 / 9 = 4, and 4 * 3 = 12. On a box plot, which values can be considered outliers? CORRECT ANSWER √√ >>>Values more than 2 times the inter-quartile range (IQR) from the upper or lower quartiles. What are the two main ways of presenting multivariate data sets? CORRECT ANSWER √√ >>>Directly (textually) - Tables and Symbolically (pictures) - Graphs What is the term for the exploratory graphical technique that can help determine notable relationships between two variables? CORRECT ANSWER √√ >>>Scagnostics What can small multiples be used for? CORRECT ANSWER √√ >>>1. Show snapshots of events that change over time

  1. Make it possible to scan rapidly across a trellis of small similar charts 3. Make it possible to spot patterns easily Mosaic plots allow you to examine the relationship among two or more _____________ variables. CORRECT ANSWER √√ >>>categorical What types of techniques can be used to boost pixel-based displays? CORRECT ANSWER √√ >>>Halos, background coloring, distortion, and hatching

The objective of a supervised learning model is what? CORRECT ANSWER √√ >>>To predict the correct label for a newly presented input data In supervised learning, for every observation of the feature measurement(s), x_i, i = 1, ..., n there is an associated _______ _______ y_i. CORRECT ANSWER √√ >>>response measurement

Which node has the smallest entropy?

  1. A node with 10 black circles and 4 red triangles.
  2. A node with 1 black circle and 15 red triangles.
  3. A node with 7 black circles, 4 red triangles, and 3 green rectangles. CORRECT ANSWER √√ >>>Node 2. A pure cluster or node will have instances that all have the same class attribute values. As Node 2 is more pure compared to Nodes 1 and 3 its entropy is less. Describe how to find the K-nearest neighbor prediction with a set of data using Euclidean distance. CORRECT ANSWER √√ >>>1. Find the distance between each point and the point given as the cluster center.
  4. Sort the distances from shortest to longest and find the first K-elements that are the closest.
  5. Look at how the K-nearest elements are classified and use simple majority of the category as the prediction value of the instance. In a pure subset, entropy is equal to what? What is the range of values that entropy can equal? CORRECT ANSWER √√ >>>In a pure subset, all instances have the same class attribute. Entropy would be zero. The range of entropy is between 0 and 1. What is one way we can speed up the K-NN algorithm when it is classifying a new item added to the dataset? CORRECT ANSWER √√ >>>Using cluster representations for comparison. Cluster representations would represent the entire cluster as a datapoint, which would reduce the number of computations required to classify the new items added to the dataset. "Least squares" is a popular method for solving what? CORRECT ANSWER √√ >>>Regression Regression can be solved by estimating what two things, using the provided data set and labels Y? CORRECT ANSWER √√ >>>The vector of regression coefficients (W) and epsilon.
  1. The centroids have stopped changing. CORRECT ANSWER √√ >>>Number 2 is NOT a stopping criteria. Is K means clustering supervised or unsupervised? CORRECT ANSWER √√ >>>Unsupervised In cyclical time, does the ordering of points matter? CORRECT ANSWER √√ >>>No, it's meaningless. Grouping together similar items is a form of unsupervised learning called what? CORRECT ANSWER √√ >>>Clustering True or False: Clustering algorithm is a form of unsupervised learning that requires a dissimilarity measure. CORRECT ANSWER √√ >>>True. True or False: Euclidean distance is used to calculate the co-variance matrix. CORRECT ANSWER √√ >>>False! Mahalanobis distance is used to calculate the co-variance matrix. When ground truth is available, what is the best way to evaluate unsupervised learning? CORRECT ANSWER √√ >>>Accuracy When ground truth is not available, how is unsupervised learning evaluated? CORRECT ANSWER √√ >>>Cohesiveness and Separateness Cohesiveness-instances inside clusters are close to each other Separateness-clusters are well-separated from each other

What measurements compares average distance value between instances in the same cluster to average distance values between instances in different clusters? CORRECT ANSWER √√ >>>Silhouette index Silhouette index lies between? CORRECT ANSWER √√ >>>[-1, 1] What is the best case measurement for silhouette? CORRECT ANSWER √√ >>>Best measurement is 1, meaning the distance within the cluster is 0 and the distance between clusters is high. What is the relationship between the number of clusters and cohesiveness in a clustering algorithm? CORRECT ANSWER √√ >>>As the number of clusters increases, the value of cohesiveness decreases. True or False: A scatterplot matrix can be used to identify the relationship between different categories of data in a multivariate dataset. CORRECT ANSWER √√ >>>True, they enable the eye to efficiently and quickly identify variable pairings with strong or weak relationships. What is the goal of the model evaluation process? CORRECT ANSWER √√ >>>To see how the model is performing on the unseen data. In k-means clustering, a point is considered to be in a particular cluster if it is closer to that cluster's _________ than any other ________. CORRECT ANSWER √√ >>>Centroid How is error calculated using accuracy in supervised learning? CORRECT ANSWER √√ >>>1 - accuracy What does k-fold cross-validation training do? CORRECT ANSWER √√ >>>1. Divide training sets into some number of equally-sized sets.

________________ is an exploratory querying tool used to identify high level patterns in the data. Please review "Exploratory Querying" in Week 1 and attempt this question again. CORRECT ANSWER √√ >>>Drill-down/rollup What is a disadvantage of using Chernoff faces? CORRECT ANSWER √√ >>>A single Chernoff face is not sufficient to get an idea of the attributes belonging to the data. We need at least 2 faces to compare. Suppose you have a set of ratio data with a meaningful zero point. Which color scheme is most suitable for visualizing this data? CORRECT ANSWER √√ >>>Divergent What do we call points that are taken at regular intervals from the cumulative distribution function of a random variable? CORRECT ANSWER √√ >>>Quantiles In which step of the supervised learning process do we give the model an unlabelled dataset to get predictions? CORRECT ANSWER √√ >>>testing What is the L1 norm between two points? CORRECT ANSWER √√ >>>L1 norm distance is the sum of the absolute difference for all coordinates. For example: (1, 1) and (2, 2), the L1 norm is 2. What is the formula to calculate the entropy of a subset in a decision tree? CORRECT ANSWER √√ >>>-p log p-n log n, where p represents the probability of positive class in the subset and n represents the probability of negative class in the subset. For the histogram width-based formula, the number of bins can be calculated as ⌈ CORRECT ANSWER √√ >>>Ceiling of (max x - min x) / h

What are some potential problems with multivariate analysis? CORRECT ANSWER √√ >>>1. Regular charts like parallel coordinate plots may become too congested.

  1. Multivariate analysis comes with the curse of dimensionality.
  2. Multivariate analysis may reveal correlations between unrelated variables. In which supervised learning method is linear approximation used? CORRECT ANSWER √√ >>>Regression True or False: Ordered time domains consider things that happen one after another. CORRECT ANSWER √√ >>>True __________ time considers multiple what-if scenarios, allowing comparison of alternate scenarios. CORRECT ANSWER √√ >>>Branching True or False: For temporal analysis design principles, spatial position should not be used as a visual cue. CORRECT ANSWER √√ >>>False, it is the strongest visual cue. True or False: For temporal analysis design principles, we should provide side-by-side comparisons of small multiple views. CORRECT ANSWER √√ >>>True What is a control chart? CORRECT ANSWER √√ >>>A graph used to study how a process changes over time. It always has a central line for average, an upper line for upper control limit, and a lower line for lower control limit. When should we use a control chart? CORRECT ANSWER √√ >>>1. Controlling ongoing processes by finding and correcting problems as they occur.
  3. Predicting expected range of outcomes from a process.
  4. Determining whether a process is stable.

1212 N CLARK ST

2360 W ADDISON ST

1239 W GRANVILLE AVE

2722 N CLARK ST

8902 N BROAD LANE

Which technique would be best for extracting all addresses that have a "W" after the house number? CORRECT ANSWER √√ >>>Subsequence. Subsequences indicate the string formed by removing some symbols from the original string. Subsequence search is useful for finding the subsequences in a larger regular expression. Consider this sample of street addresses: 1221 N CLARK ST 2360 W ADDISON ST 1239 W GRANVILLE AVE 2712 N CLARK ST 8902 N BROADWAY Which technique would be best for extracting addresses where the street number starts with 12? CORRECT ANSWER √√ >>>Prefix search is the best technique for identifying and extracting the patterns that match the beginning of a string. Let S be a set of s strings from alphabet Σ such that no string in S is a prefix of another string. If T is the trie for S, then how many leaves does T have? CORRECT ANSWER √√ >>>s Which of the following is not a subsequence of 'GCFITQSPPN'? IST CPN

FQSN

GIT CORRECT ANSWER √√ >>>IST is not a subsequence of 'GCFITQSPPN' because it is not contiguous. With a brute-force approach, what is the worst case cost of finding a 2-character substring in a string of 10 characters? CORRECT ANSWER √√ >>>>18, The minimum number of comparisons made to find a substring of length M in a string of length N is M*(N - M + 1). ______ ______ is a trie representation of a string, with suffixes of given text as key and position in the text as value. CORRECT ANSWER √√ >>>Suffix tree Suppose you are using the KMP algorithm to search a pattern of length M in a string of length N. Which statement does NOT accurately identify characteristics of the KMP algorithm?

  1. The running time is O(NM).
  2. The KMP alg never needs to move backward in the input text.
  3. The running time is linear, O(M+N).
  4. The KMP alg minimizes the total number of comparisons of the pattern against the input string. CORRECT ANSWER √√ >>>1. The running time is O(NM). This is a false statement. What is the cost of computing the edit distance between two strings of length M and N? CORRECT ANSWER √√ >>>O(MN)