




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive set of questions and answers related to data visualization, covering key concepts, techniques, and applications. It explores various aspects of data visualization, including data processing, querying, exploration, visualization tools, data types, visual variables, data analysis, and visualization techniques. Valuable for students and professionals seeking to understand the fundamentals of data visualization and its practical applications.
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Data processing CORRECT ANSWER user knows what he/she wants and has a function to compute it Querying CORRECT ANSWER user knows what he or she wants and can describe it Navigation CORRECT ANSWER user knows what he or she wants and use cannot describe it or locate it Exploration CORRECT ANSWER user doesn't know what he or she wants. Wants to acquire new knowledge and reveal new facts 5 Types of Exploration Tools CORRECT ANSWER 1. Analysis (identify common patterns or outliers)
Main components of a sequence CORRECT ANSWER starting point, ending point, length Data Deluge CORRECT ANSWER A vast increase in the amount of data generated by individuals and businesses. Dynamic visualization CORRECT ANSWER displays a single interactive visualization of data Features CORRECT ANSWER vector representations of objects in a dataset Schema CORRECT ANSWER structure and properties of a data collection Schemas enable retrieval of data for what? CORRECT ANSWER Comparison, indexing, query optimization, and query processing qualitative color scheme CORRECT ANSWER best for nominal data What are Bertin's 7 visual variables? CORRECT ANSWER 1. Position
Skewness CORRECT ANSWER measure of the symmetry of the probability distribution Which measure of central tendency will have the largest value? CORRECT ANSWER mean How many categories should pie charts have at max? CORRECT ANSWER 7 to 9 T/F: Bar charts are best for showing trends over time CORRECT ANSWER F Semi-structured data CORRECT ANSWER Can say "or" in the schema and can have null values or repetition Data is self-describing, each item in the db describes its own schema. Vector Data CORRECT ANSWER Takes unstructured data and turns it into a simple representation Vector space CORRECT ANSWER basis vector and distance/ similarity function Basis vectors must be... CORRECT ANSWER complete and non-redundant
Similarity / ranked queries CORRECT ANSWER Not all sub-goals need to be satisfied, we need to find out how important that subgoal is Subjective criteria (semantic gap/subjectivity) CORRECT ANSWER get relevance feedback on how important certain features are Iceberg CORRECT ANSWER Add constraint to summary table to further reduce it Nominal CORRECT ANSWER data whose categories have no implied ordering Ordinal CORRECT ANSWER data that has a specified order, but no specified distance metric (S, M, L) Interval CORRECT ANSWER data that has measurable distances and is additive (like temp) Ratio CORRECT ANSWER data that is measurable but is multiplicative and has a 0 point. Data analysis CORRECT ANSWER prepare for visualization, smooth, transform to get prepared data Filtering CORRECT ANSWER subset selected for visualization to get focused data Mapping CORRECT ANSWER mapped to geometric primitives to get geometric data Rendering CORRECT ANSWER transform to image data to get the image What visual variable is the best way to represent quantitative dimensions? CORRECT ANSWER Position What is the worst attribute to use for graphing data? CORRECT ANSWER Area & volume What attribute must vary without impacting size or rotation? CORRECT ANSWER Shape When can lines, areas, and surfaces only rotate? CORRECT ANSWER If they are positionally unconstrained Granularity CORRECT ANSWER reposition of a pattern per unit of area Orientation CORRECT ANSWER angle of the pattern Fourier transforms CORRECT ANSWER Decompose grid of brightness into sums of trigonometric components
Auto correlogram CORRECT ANSWER characterize the spatial movements of a texture Sequential color scheme CORRECT ANSWER grayscale is a particular one represent ordered data intuitive, but limited number of distinguishable colors Divergent color scheme CORRECT ANSWER lacks natural ordering, but has a 0 point Careful choice for high and low end Multivariate color schemes CORRECT ANSWER maps multiple features to colors increases cognitive load data distribution CORRECT ANSWER affects how it should be analyzed and visualized key step is pre-conditioning data Positive skew CORRECT ANSWER biggest hump on left mode < median < mean Negative skew CORRECT ANSWER biggest hump of right mean < median < mode What do pie charts use and what are the key aspects? CORRECT ANSWER angle and area and arc length. the last two are key What are line charts best for? CORRECT ANSWER comparing over time What are bar charts used for? CORRECT ANSWER comparing between groups Heckbert, Nice numbers CORRECT ANSWER optimizes the number of tick marks, but for small numbers the range can be much larger than the data range, you can drop them but then it will become uneven Wilkson's algorithm CORRECT ANSWER format + fontsize + orientation + overlap / 4 Sturge's formula CORRECT ANSWER ceil (logN + 1) for k number of bins in histogram Scott's choice CORRECT ANSWER h=3.5stddev/N^1/3 for k number of bins in histogram Freedman-Diaconis CORRECT ANSWER h=2IQR(x)N^(-1/3) for k number of bins in histogram Common choice for k bins in hist CORRECT ANSWER sqrt(N)
Decision Tree Learning CORRECT ANSWER Supervised Multiple trees can come from same one Pure subset: instances have all the same class attribute Uses entropy to make better selections Entropy CORRECT ANSWER -p+logp+ - p-logp- 0 is better KNN CORRECT ANSWER Assign the label the majority of k nearest neighbor’s have Leave-one-out training CORRECT ANSWER Use all instances but one to train and the one left for testing K-Fold cross validation CORRECT ANSWER Usually k = 10 Divide training set into k equal sized sets Run it k times use all but i for training and test with i average performance How much of the data should be used for testing (if you have enough data)? CORRECT ANSWER 1/ error rate CORRECT ANSWER 1 - (correct/total) Clustering CORRECT ANSWER group into similar clusters using distance to centroids Convergence CORRECT ANSWER centroids are no longer changing Euclidean distance CORRECT ANSWER sqrt(sum(xi-yi)^2) What distance formula is uses a covariance matrix? CORRECT ANSWER Mahalonbis Silhoutte Index CORRECT ANSWER Evaluates a clustered dataset based on cohesiveness and separateness Our interest clustering where a(x) < b(x) Cohesiveness formula CORRECT ANSWER take average distance within the cluster to the centroid, 1/(|C|-1) * sum(||x-y||)^ Silhoutte formula CORRECT ANSWER 1/n(sum( (b(x)-a(x))/max(b(x), a(x))) Multiple regression analysis CORRECT ANSWER predict a dependent variable based on the levels of more than one independent variable
Factor analysis CORRECT ANSWER reduce a set of variables to a smaller set of composite variables by identifying underlying dimensions of the data Conjoint analysis CORRECT ANSWER estimate the utility that consumers associate with different product features