Data Visualization Final Exam Questions and Answers (CSE 578), Exams of Nursing

A comprehensive set of questions and answers related to data visualization, specifically tailored for the cse 578 course. It covers a wide range of topics, including the benefits of data visualization, differences between data processing and querying, data exploration and navigation, and various data challenges. The material also delves into data schemas, database structures, the curse of dimensionality, data transformation, and scalable data exploratory systems. Visual variables, color schemes, and statistical concepts like skewness and quantiles are also addressed, making it a valuable resource for students studying data visualization.

Typology: Exams

2025/2026

Available from 12/05/2025

supergrades1
supergrades1 🇺🇸

4.6

(5)

8.2K documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 | P a g e
CSE 578 DATA VISUALIZATION FINAL EXAM | (LATEST
EXAM 2025) QUESTIONS AND CORRECT ANSWERS |
ALREADY GRADED A+ | VERIFIED ANSWERS
Why is data visualization helpful? CORRECT ANSWER √√>>>1. amplifies cognition
2. expands working memory
3. reduces search time
4. improves pattern detection
5. controls attention
Describe the difference between data processing and querying CORRECT ANSWER √√>>>In
both instances, the user knows what they want. The difference is that with querying, they can
only describe it whereas in data processing they actually have a way to compute it.
Describe the difference between data exploration and navigation CORRECT ANSWER
√√>>>In exploration, the user does not know what they want but wants to get an idea about
the data. In navigation, the user DOES know what they want but does not know how to
describe/locate it.
What do these acronyms describe? INS, 3Vs, HMLE CORRECT ANSWER √√>>>Data
challenges
What does INS stand for and mean? CORRECT ANSWER √√>>>INS is a lit of data
challenges: Imprecision, Noise, Sparsity
What does 3Vs stand for and mean? CORRECT ANSWER √√>>>3Vs is a list of data
challenges: Volume, Velocity, Variety
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Data Visualization Final Exam Questions and Answers (CSE 578) and more Exams Nursing in PDF only on Docsity!

CSE 578 DATA VISUALIZATION FINAL EXAM | (LATEST

EXAM 2025) QUESTIONS AND CORRECT ANSWERS |

ALREADY GRADED A+ | VERIFIED ANSWERS

Why is data visualization helpful? CORRECT ANSWER √√ >>>1. amplifies cognition

  1. expands working memory
  2. reduces search time
  3. improves pattern detection
  4. controls attention Describe the difference between data processing and querying CORRECT ANSWER √√ >>>In both instances, the user knows what they want. The difference is that with querying, they can only describe it whereas in data processing they actually have a way to compute it. Describe the difference between data exploration and navigation CORRECT ANSWER √√ >>>In exploration, the user does not know what they want but wants to get an idea about the data. In navigation, the user DOES know what they want but does not know how to describe/locate it. What do these acronyms describe? INS, 3Vs, HMLE CORRECT ANSWER √√ >>>Data challenges What does INS stand for and mean? CORRECT ANSWER √√ >>>INS is a lit of data challenges: Imprecision, Noise, Sparsity What does 3Vs stand for and mean? CORRECT ANSWER √√ >>>3Vs is a list of data challenges: Volume, Velocity, Variety

What does HMLE stand for and mean? CORRECT ANSWER √√ >>>HMLE is a list of data challenges: High-dimensional, Multi-modal, Inter-Linked, Evolving What is a data schema? CORRECT ANSWER √√ >>>A set of constraints that...

  1. describe the "properties" of data
  2. describe the structure of data
  3. enable validation & efficient storage of data
  4. enable querying and retrieval of data Advantages of a structured database? CORRECT ANSWER √√ >>>1. Easier to query
  5. Easier to optimize
  6. Easier to explore Advantages of semi-structured database? CORRECT ANSWER √√ >>>Data organization is flexible/malleable (easier to integrate and exchange). Describe the curse of dimensionality CORRECT ANSWER √√ >>>The more dimensions we have, the more data we need to discover patterns (prevent overfitting). True or False: The distance between two points is equal to the length of the distance vector. CORRECT ANSWER √√ >>>True Give an example of data transformation CORRECT ANSWER √√ >>>Gender column originally being a single column with "M" or "F", and then is transformed into two columns (one for M and one for F) with 0s and 1s to confirm sex. Which aspects of data should be handled by a scalable data exploratory system? CORRECT ANSWER √√ >>>1. The amount of data

Give an example of ordinal data CORRECT ANSWER √√ >>>Small, Medium, Large What is ordinal data? CORRECT ANSWER √√ >>>Data that has a specified order, but no specified distance metric. What is interval data? CORRECT ANSWER √√ >>>Data that has measurable distances (such as temperature). What is ratio data? CORRECT ANSWER √√ >>>Same as interval data, but includes a zero point (measuring tape). In which step of the visualization pipeline is data prepared for the visualization (smooth, interpolate, transform)? CORRECT ANSWER √√ >>>Data analysis In which step of the visualization pipeline is a subset of the data selected for visualization? CORRECT ANSWER √√ >>>Filtering In which step of the visualization pipeline are data mapped to geometric primitives and their attributes? CORRECT ANSWER √√ >>>Mapping In which step of the visualization pipeline is geometric data transformed to image data? CORRECT ANSWER √√ >>>Rendering According to Bertin's Visual Variables, what is the best way to represent a quantitative dimension visually? CORRECT ANSWER √√ >>>Position True or False: Area and volume are among the best attributes to use for graphing data. CORRECT ANSWER √√ >>>False. They are among the WORST attributes.

True or False: Objects in the skyline are not dominated by any other objects in the database. CORRECT ANSWER √√ >>>True Which data type is best suited for a rainbow color scheme? CORRECT ANSWER √√ >>>Nominal because no ordering is implied. Qualitative color scheme is a univariate or multivariate color scheme? CORRECT ANSWER √√ >>>Univariate Sequential color schemes are best suited for which type of data? CORRECT ANSWER √√ >>>Ordered data A divergent color scheme is a univariate color scheme that is best suited for which type of data? CORRECT ANSWER √√ >>>Ratio data where there is some meaningful zero point In a positive skew, the curve slopes down toward which direction? CORRECT ANSWER √√ >>>Slope is down from left to right In a negative skew, the curve slopes down toward which direction CORRECT ANSWER √√ >>>Slope is down from right to left How many visual variables did Bertin identify? How many can you name? CORRECT ANSWER √√ >>>7 and they are: position, size, value, color, texture, orientation, and shape. Can you describe "data deluge"? CORRECT ANSWER √√ >>>A vast increase in the amount of data generated by individuals and businesses.

√√ >>>Solution is to drop labels which overlap or fall outside the data range. This leads to unevenly spaced labels or axes with only one label. ____________ are points taken at regular intervals from the cumulative distribution function of a random variable. CORRECT ANSWER √√ >>>Quantiles What are the 5 main measures in a box-and-whisker plot? CORRECT ANSWER √√ >>>Lower extreme, lower quartile, median, upper quartile, upper extreme When a histogram has a tail that goes to the right, which way is it skewed? CORRECT ANSWER √√ >>>Right skewed (AKA positive skew) Using Sturge's formula, how many bins should there be for a dataset of 10 points? CORRECT ANSWER √√ >>>Sturge's formula is K=1+log (base 2) N where K is the number of class intervals (bins), and N is the number of observations. This formula is useful when we want to make the data fit a normal distribution pattern. In our case, it would be 5 bins. The first step in finding quantiles of a dataset is to? CORRECT ANSWER √√ >>>Sort the data 19 23 26 30 33 35 38 38 40 42 45 45 47 56 What is the value of the first quartile in this dataset? CORRECT ANSWER √√ >>> How do you calculate the maximum number of data points that falls below the third quartile? CORRECT ANSWER √√ >>>Divide the number of data points by the number of quantiles, and multiply by the quantile number. Example: Suppose that a dataset containing 36 data points is divided into 9 quantiles.

36 / 9 = 4, and 4 * 3 = 12. On a box plot, which values can be considered outliers? CORRECT ANSWER √√ >>>Values more than 2 times the inter-quartile range (IQR) from the upper or lower quartiles. What are the two main ways of presenting multivariate data sets? CORRECT ANSWER √√ >>>Directly (textually) - Tables and Symbolically (pictures) - Graphs What is the term for the exploratory graphical technique that can help determine notable relationships between two variables? CORRECT ANSWER √√ >>>Scagnostics What can small multiples be used for? CORRECT ANSWER √√ >>>1. Show snapshots of events that change over time

  1. Make it possible to scan rapidly across a trellis of small similar charts
  2. Make it possible to spot patterns easily Mosaic plots allow you to examine the relationship among two or more _____________ variables. CORRECT ANSWER √√ >>>categorical What types of techniques can be used to boost pixel-based displays? CORRECT ANSWER √√ >>>Halos, background coloring, distortion, and hatching True or False: Pixel-based displays are not well-suited for large amounts of data. CORRECT ANSWER √√ >>>False, they ARE well suited for large amounts of data! For a parallel coordinate plot, should data be normalized? CORRECT ANSWER √√ >>>Yes

Which node has the smallest entropy?

  1. A node with 10 black circles and 4 red triangles.
  2. A node with 1 black circle and 15 red triangles.
  3. A node with 7 black circles, 4 red triangles, and 3 green rectangles. CORRECT ANSWER √√ >>>Node 2. A pure cluster or node will have instances that all have the same class attribute values. As Node 2 is more pure compared to Nodes 1 and 3 its entropy is less. Describe how to find the K-nearest neighbor prediction with a set of data using Euclidean distance. CORRECT ANSWER √√ >>>1. Find the distance between each point and the point given as the cluster center.
  4. Sort the distances from shortest to longest and find the first K-elements that are the closest.
  5. Look at how the K-nearest elements are classified and use simple majority of the category as the prediction value of the instance. In a pure subset, entropy is equal to what? What is the range of values that entropy can equal? CORRECT ANSWER √√ >>>In a pure subset, all instances have the same class attribute. Entropy would be zero. The range of entropy is between 0 and 1. What is one way we can speed up the K-NN algorithm when it is classifying a new item added to the dataset? CORRECT ANSWER √√ >>>Using cluster representations for comparison. Cluster representations would represent the entire cluster as a datapoint, which would reduce the number of computations required to classify the new items added to the dataset. "Least squares" is a popular method for solving what? CORRECT ANSWER √√ >>>Regression

Regression can be solved by estimating what two things, using the provided data set and labels Y? CORRECT ANSWER √√ >>>The vector of regression coefficients (W) and epsilon. True or False: In supervised learning, testing data is used for determining when a model is overfitted, and can be used to evaluate the model. CORRECT ANSWER √√ >>>True K-fold cross validation is a resampling strategy used to evaluate the model. What does the parameter k refer to? CORRECT ANSWER √√ >>>k is the number of groups into which the given data is split. Suppose you have a dataset of 100 points. If the leave-one-out validation technique is used, how many times does the model need to be fit? CORRECT ANSWER √√ >>>100. In the leave-one-out technique, we use all the instances but one for training. The one instance left is used for testing. If we have N instances, we use N-1 for training and 1 for testing. Does k-NN learn? CORRECT ANSWER √√ >>>No, it is considered lazy learning. What is the formula for accuracy given a confusion matrix? CORRECT ANSWER √√ >>>(TP + TN) / (TP + TN + FP + FN) When using k-means, how are new centroids formed? CORRECT ANSWER √√ >>>New centroids are formed by taking the mean of all the points in each cluster. Which of these is NOT a stopping criteria for k-means classification?

  1. No points are shifting from one cluster to another.
  2. Distance of data points from the centroids is maximum.

What measurements compares average distance value between instances in the same cluster to average distance values between instances in different clusters? CORRECT ANSWER √√ >>>Silhouette index Silhouette index lies between? CORRECT ANSWER √√ >>>[-1, 1] What is the best case measurement for silhouette? CORRECT ANSWER √√ >>>Best measurement is 1, meaning the distance within the cluster is 0 and the distance between clusters is high. What is the relationship between the number of clusters and cohesiveness in a clustering algorithm? CORRECT ANSWER √√ >>>As the number of clusters increases, the value of cohesiveness decreases. True or False: A scatterplot matrix can be used to identify the relationship between different categories of data in a multivariate dataset. CORRECT ANSWER √√ >>>True, they enable the eye to efficiently and quickly identify variable pairings with strong or weak relationships. What is the goal of the model evaluation process? CORRECT ANSWER √√ >>>To see how the model is performing on the unseen data. In k-means clustering, a point is considered to be in a particular cluster if it is closer to that cluster's _________ than any other ________. CORRECT ANSWER √√ >>>Centroid How is error calculated using accuracy in supervised learning? CORRECT ANSWER √√ >>>1 - accuracy What does k-fold cross-validation training do? CORRECT ANSWER √√ >>>1. Divide training sets into some number of equally-sized sets.

  1. Run algorithm the same number of times as the number of sets.

Describe how you would complete the k-means algorithm. CORRECT ANSWER √√ >>>1. Calculate the distance between each point and each centroid.

  1. Assign each point to its nearest cluster center.
  2. Calculate the mean for each cluster, those are the new centroids.
  3. Repeat. In which learning model do we divide the initial set of the data upon conditions and continue to do so until we have a subset that belongs to only one class? CORRECT ANSWER √√ >>>Decision trees. Decision Tree follows this same strategy until we get a pure subset. How are supervised models tested if the dataset is too small? CORRECT ANSWER √√ >>>Leave-one-out training and K-fold cross validation What does Latent Dirichlet Allocation (LDA) achieve? CORRECT ANSWER √√ >>>Topic modelling. LDA is used prior to text visualization, it tries to identify the topic based on word distribution. How do you calculate bin width with standard deviation and the number of data points using Scott's formula? CORRECT ANSWER √√ >>>bin_width = (3.5*std_dev) / (num_data_points^(1/3)) ________________ is an exploratory querying tool used to identify high level patterns in the data. Please review "Exploratory Querying" in Week 1 and attempt this question again. CORRECT ANSWER √√ >>>Drill-down/rollup

In which supervised learning method is linear approximation used? CORRECT ANSWER √√ >>>Regression True or False: Ordered time domains consider things that happen one after another. CORRECT ANSWER √√ >>>True __________ time considers multiple what-if scenarios, allowing comparison of alternate scenarios. CORRECT ANSWER √√ >>>Branching True or False: For temporal analysis design principles, spatial position should not be used as a visual cue. CORRECT ANSWER √√ >>>False, it is the strongest visual cue. True or False: For temporal analysis design principles, we should provide side-by-side comparisons of small multiple views. CORRECT ANSWER √√ >>>True What is a control chart? CORRECT ANSWER √√ >>>A graph used to study how a process changes over time. It always has a central line for average, an upper line for upper control limit, and a lower line for lower control limit. When should we use a control chart? CORRECT ANSWER √√ >>>1. Controlling ongoing processes by finding and correcting problems as they occur.

  1. Predicting expected range of outcomes from a process.
  2. Determining whether a process is stable.
  3. Analyzing patterns of process variation from special causes or common causes.
  4. Determining whether the quality improvement project should aim to prevent specific problems or to make fundamental changes to the process.

What is a moving average chart? CORRECT ANSWER √√ >>>1. Monitors process location over time.

  1. Generally used for detecting small shifts in process mean.
  2. Control limits are derived from average range on Range Chart What is a range chart? CORRECT ANSWER √√ >>>1. Monitors process variation over time
  3. Should be reviewed before moving average chart Give an example of a prefix search CORRECT ANSWER √√ >>>Find all strings that start with "tab":
  • "table"; "tabular"; "tablet"; .... Give an example of a subsequence search CORRECT ANSWER √√ >>>Find all strings that contain the subsequence "ark":
  • "marketing"; "spark"; "quark" or Find all occurrences of "acd":
  • "aabacdcdabdcababdacddcab." What is the brute force approach for subsequence search? CORRECT ANSWER √√ >>>scan the sequence, while aligning the pattern for each position in the sequence Tries work well if we search for a what? CORRECT ANSWER √√ >>>prefix Consider this sample of street addresses: 1212 N CLARK ST 2360 W ADDISON ST

With a brute-force approach, what is the worst case cost of finding a 2-character substring in a string of 10 characters? CORRECT ANSWER √√ >>>>18, The minimum number of comparisons made to find a substring of length M in a string of length N is M*(N - M + 1). ______ ______ is a trie representation of a string, with suffixes of given text as key and position in the text as value. CORRECT ANSWER √√ >>>Suffix tree Suppose you are using the KMP algorithm to search a pattern of length M in a string of length N. Which statement does NOT accurately identify characteristics of the KMP algorithm?

  1. The running time is O(NM).
  2. The KMP alg never needs to move backward in the input text.
  3. The running time is linear, O(M+N).
  4. The KMP alg minimizes the total number of comparisons of the pattern against the input string. CORRECT ANSWER √√ >>>1. The running time is O(NM). This is a false statement. What is the cost of computing the edit distance between two strings of length M and N? CORRECT ANSWER √√ >>>O(MN)