Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Visualization Tool Analysis: Comparing Four Software Programs for Data Analysis, Study Guides, Projects, Research of Computer Science

Georgia Institute of Technology - Main Campus Computer Science

An analysis of four software programs used for data visualization: eureka, infozoom, seeit, and spotfire pro. The author compares their attributes and performance on two datasets - mutual funds and films. The strengths and weaknesses of each tool, and offers insights into their suitability for handling numeric and non-numeric data, filtering, and visualization.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/05/2009

koofers-user-zm3 🇺🇸

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

McPherson White

2/15/05

Info. Vis.

Visualization Tool Analysis

Each of the four software programs had attributes that they were able to do better

than the others. The no free lunch theorem would seem appropriate for describing how

each program was not able to handle every situation equally well. To start the process, I

chose the two data sets of mutual funds and films. I chose these because one had a lot of

numeric data values involved where as the other had a lot of qualitative and non-numeric

data values. I then came up with about five questions that I wanted to see how each of

the programs handled the answers. As I used each program, I realized that there were

both problems as well as surprises. With these two data sets and these four programs,

the “no free lunch” theorem should apply, meaning that each program will excel at some

tasks, but will not be able to cover all possible tasks with such proficiency.

The film data set offered the opportunity to create tasks such as finding the most

awarded, popular actor over the given time period. Another task was to find the number

of films produced a year and then average the popularity of each year, in order to find a

possible trend between popularity and the number of films produced in a year. For the

mutual fund data set, the tasks were to pick the most lucrative mutual fund, which is not

so simple. The factors were to choose the most lucrative long-term fund, international

fund, and short-term fund. If you can hold out for ten years, then you would want to

choose a fund that performed successfully for a long period as opposed to concentrating

on the three month, five month, and yearly performances. A number of questions arose

from trying to get these tasks done. These are just some of the tasks that a data analyst

might be asked to find when given the data sets. The questions were often about

separating out simpler answers to the tasks, and if presented with multiple answers, then

Discover Study Guides, Projects, Research of Computer Science Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Visualization Tool Analysis: Comparing Four Software Programs for Data Analysis and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

McPherson White2/15/ Info. Vis. Visualization Tool Analysis Each of the four software programs had attributes that they were able to do better than the others. The no free lunch theorem would seem appropriate for describing how each program was not able to handle every situation equally well. To start the process, I chose the two data sets of mutual funds and films. I chose these because one had a lot of numeric data values involved where as the other had a lot of qualitative and non-numeric data values. I then came up with about five questions that I wanted to see how each of the programs handled the answers. As I used each program, I realized that there were both problems as well as surprises. With these two data sets and these four programs, the “no free lunch” theorem should apply, meaning that each program will excel at some tasks, but will not be able to cover all possible tasks with such proficiency. The film data set offered the opportunity to create tasks such as finding the most awarded, popular actor over the given time period. Another task was to find the number of films produced a year and then average the popularity of each year, in order to find a possible trend between popularity and the number of films produced in a year. For the mutual fund data set, the tasks were to pick the most lucrative mutual fund, which is not so simple. The factors were to choose the most lucrative long-term fund, international fund, and short-term fund. If you can hold out for ten years, then you would want to choose a fund that performed successfully for a long period as opposed to concentrating on the three month, five month, and yearly performances. A number of questions arose from trying to get these tasks done. These are just some of the tasks that a data analyst might be asked to find when given the data sets. The questions were often about separating out simpler answers to the tasks, and if presented with multiple answers, then

which one should be chosen, using the remaining data. For example there are a number of films that have the highest popularity of eighty-eight, but should those which were awarded be placed higher. The tasks and questions were answered by manipulating color, shape, and various graph types. Starting with Eureka, I found that there was a lot of usefulness in the compression mode for noticing trends amongst varying attributes. The graphing aspect of Eureka was slightly more limited than the ease of the compression mode. I found that Eureka provided great support in creation of filtering tasks, such as separating out those films with no awards, or those films, which had been given a higher popularity rating. The difference between the two datasets provided a surprise. The mutual funds had negative values, and the films did not, which made the view in the compression mode interesting. I also found that the difference in numeric and non-numeric attributes was sometimes difficult for the program to graph, especially with multivariate non-numeric comparison. I thought Eureka was quite useful once I got the hang of it, but it also seemed more data driven than some of the other programs. The next program tested was InfoZoom. Initially I inserted the film data and began to inspect the different options. The data was placed into a table format, which was not really helpfully to visualize how the data. The compression function was great though. By compressing the data and then sorting the data on some attribute, you were able to see correlations in the data and notice if there were any outliers. The compression view was great and made for a simple and quick check. I used the compression view to determine if there was a correlation between length of a film and popularity, which there was little or no evidence to support such a claim. I also used it to check for a correlation between popularity and genre. This check was a little more difficult to tell, because in

These examples show both some of the better visualizations of InfoZoom, as well as some of the more horrific visualizations that are made by the program to represent the data. InfoZoom was not particularly good for visualizations that were easy, and quick to manipulate. SeeIt was drastically different than the other two programs. Immediately there was a visualization of the data, and it was simple to change between attributes, and change the shape and color of the graph. I was surprised by the ability to create the “wall” graphs, which were capable of producing averages, counts, and sums. The aggregates were often left out of the other programs, or were really hard to find. SeeIt made it really easy to do more than one comparison in a single visualization. This was great for answering questions that arose from finding correlations. I found it particularly useful in the ability to find an actor’s average popularity of films, and thus being able to

These examples are pretty standard for a SeeIt visualization, and if you notice there are often extra buckets created in the visualization because of the presence of outliers. Also in the case of year they start at 0 and go to 2000, but clearly not all that is needed, and it would seem to make more sense to bring the range closer to the actual max and min of the data set. Spotfire Pro was the final visualization platform that I tested. Spotfire similar to SeeIt had an information visualization immediately after inserting the data. Unlike SeeIt the data was not in a side window, which actually was great, because you were able to use the dual scroll bars and individual scrollbars to manipulate the data. I found that filtering was easy and fast, and so was sorting. I liked that you could change the attributes on the axis in the visualization very simply. I also liked that you could sort of zoom in on the visualization by filtering out the portions of the graph not desired. For the

tasks Spotfire was very good, but I found that also had a hard time doing aggregate functions. The aggregate functions were not nearly as easily as accessible as in some of the other programs. The surprising thing about Spotfire was the ability to make the size of the objects in the plot correspond to a value of an attribute, so you could have a graph of up to 4 variables by changing the size of the dot on the plot. I thought that was really a very cool feature, since they had a hard time dealing with more than two or three attributes within a scatter plot. Spotfire Pro had great filtering and sorting functions. Jittering and selections within the visualization was perfect for finding the most of an attribute. Here are some examples:

Scatter Plot

Popularity Scatter Plot

Year

0

10

20

30

40

50

60

70

80

1960 1970 1980 1990

The final one changes the size of the object according to whether the comedy film had been awarded or not. These examples are pretty hard to understand, but you can see some of the really neat features that you can manipulate using Spotfire Pro. In conclusion, there are a number of great attributes that these visualization tools are able to cover; yet there isn’t one that really encapsulates all the great attributes of the others. I personally find that Spotfire Pro is the best tool among these. Spotfire gives a visualization, but it is not n over powering or complicated visualization, and manipulating the data becomes rather easy and fast. I like the simple use of aggregates in SeeIt, but

Visualization Tool Analysis: Comparing Four Software Programs for Data Analysis, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text

Download Visualization Tool Analysis: Comparing Four Software Programs for Data Analysis and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Scatter Plot

Popularity Scatter Plot