






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of four software programs used for data visualization: eureka, infozoom, seeit, and spotfire pro. The author compares their attributes and performance on two datasets - mutual funds and films. The strengths and weaknesses of each tool, and offers insights into their suitability for handling numeric and non-numeric data, filtering, and visualization.
Typology: Study Guides, Projects, Research
1 / 12
This page cannot be seen from the preview
Don't miss anything!







McPherson White2/15/ Info. Vis. Visualization Tool Analysis Each of the four software programs had attributes that they were able to do better than the others. The no free lunch theorem would seem appropriate for describing how each program was not able to handle every situation equally well. To start the process, I chose the two data sets of mutual funds and films. I chose these because one had a lot of numeric data values involved where as the other had a lot of qualitative and non-numeric data values. I then came up with about five questions that I wanted to see how each of the programs handled the answers. As I used each program, I realized that there were both problems as well as surprises. With these two data sets and these four programs, the “no free lunch” theorem should apply, meaning that each program will excel at some tasks, but will not be able to cover all possible tasks with such proficiency. The film data set offered the opportunity to create tasks such as finding the most awarded, popular actor over the given time period. Another task was to find the number of films produced a year and then average the popularity of each year, in order to find a possible trend between popularity and the number of films produced in a year. For the mutual fund data set, the tasks were to pick the most lucrative mutual fund, which is not so simple. The factors were to choose the most lucrative long-term fund, international fund, and short-term fund. If you can hold out for ten years, then you would want to choose a fund that performed successfully for a long period as opposed to concentrating on the three month, five month, and yearly performances. A number of questions arose from trying to get these tasks done. These are just some of the tasks that a data analyst might be asked to find when given the data sets. The questions were often about separating out simpler answers to the tasks, and if presented with multiple answers, then
which one should be chosen, using the remaining data. For example there are a number of films that have the highest popularity of eighty-eight, but should those which were awarded be placed higher. The tasks and questions were answered by manipulating color, shape, and various graph types. Starting with Eureka, I found that there was a lot of usefulness in the compression mode for noticing trends amongst varying attributes. The graphing aspect of Eureka was slightly more limited than the ease of the compression mode. I found that Eureka provided great support in creation of filtering tasks, such as separating out those films with no awards, or those films, which had been given a higher popularity rating. The difference between the two datasets provided a surprise. The mutual funds had negative values, and the films did not, which made the view in the compression mode interesting. I also found that the difference in numeric and non-numeric attributes was sometimes difficult for the program to graph, especially with multivariate non-numeric comparison. I thought Eureka was quite useful once I got the hang of it, but it also seemed more data driven than some of the other programs. The next program tested was InfoZoom. Initially I inserted the film data and began to inspect the different options. The data was placed into a table format, which was not really helpfully to visualize how the data. The compression function was great though. By compressing the data and then sorting the data on some attribute, you were able to see correlations in the data and notice if there were any outliers. The compression view was great and made for a simple and quick check. I used the compression view to determine if there was a correlation between length of a film and popularity, which there was little or no evidence to support such a claim. I also used it to check for a correlation between popularity and genre. This check was a little more difficult to tell, because in
These examples show both some of the better visualizations of InfoZoom, as well as some of the more horrific visualizations that are made by the program to represent the data. InfoZoom was not particularly good for visualizations that were easy, and quick to manipulate. SeeIt was drastically different than the other two programs. Immediately there was a visualization of the data, and it was simple to change between attributes, and change the shape and color of the graph. I was surprised by the ability to create the “wall” graphs, which were capable of producing averages, counts, and sums. The aggregates were often left out of the other programs, or were really hard to find. SeeIt made it really easy to do more than one comparison in a single visualization. This was great for answering questions that arose from finding correlations. I found it particularly useful in the ability to find an actor’s average popularity of films, and thus being able to
These examples are pretty standard for a SeeIt visualization, and if you notice there are often extra buckets created in the visualization because of the presence of outliers. Also in the case of year they start at 0 and go to 2000, but clearly not all that is needed, and it would seem to make more sense to bring the range closer to the actual max and min of the data set. Spotfire Pro was the final visualization platform that I tested. Spotfire similar to SeeIt had an information visualization immediately after inserting the data. Unlike SeeIt the data was not in a side window, which actually was great, because you were able to use the dual scroll bars and individual scrollbars to manipulate the data. I found that filtering was easy and fast, and so was sorting. I liked that you could change the attributes on the axis in the visualization very simply. I also liked that you could sort of zoom in on the visualization by filtering out the portions of the graph not desired. For the
tasks Spotfire was very good, but I found that also had a hard time doing aggregate functions. The aggregate functions were not nearly as easily as accessible as in some of the other programs. The surprising thing about Spotfire was the ability to make the size of the objects in the plot correspond to a value of an attribute, so you could have a graph of up to 4 variables by changing the size of the dot on the plot. I thought that was really a very cool feature, since they had a hard time dealing with more than two or three attributes within a scatter plot. Spotfire Pro had great filtering and sorting functions. Jittering and selections within the visualization was perfect for finding the most of an attribute. Here are some examples:
Year
0
10
20
30
40
50
60
70
80
1960 1970 1980 1990
The final one changes the size of the object according to whether the comedy film had been awarded or not. These examples are pretty hard to understand, but you can see some of the really neat features that you can manipulate using Spotfire Pro. In conclusion, there are a number of great attributes that these visualization tools are able to cover; yet there isn’t one that really encapsulates all the great attributes of the others. I personally find that Spotfire Pro is the best tool among these. Spotfire gives a visualization, but it is not n over powering or complicated visualization, and manipulating the data becomes rather easy and fast. I like the simple use of aggregates in SeeIt, but