Understanding Information Systems: Types, Characteristics, and Retrieval, Study notes of Information Systems

An in-depth exploration of information systems, their types, characteristics, and the importance of effective information retrieval. Topics include the definition of information systems, different types of systems, the role of information systems in organizations, and information retrieval techniques. Students will gain a solid foundation in the fundamentals of information systems and the skills necessary to navigate the vast amount of information available.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

little_rachel
little_rachel 🇬🇧

4.7

(6)

217 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
cis20.2
design and implementation of software applications II
spring 2008
session # II.1
information models and systems
topics:
what is information systems?
what is information?
knowledge representation
information retrieval
cis20.2-spring2008-sklar-lecII.1 1
what is information systems?
the field of information systems (IS) comprises the following:
a number of types of computer-based information systems
objectives
risks
planning and project management
organization
IS development life cycle
tools, techniques and methodologies
social effects
integrative models
cis20.2-spring2008-sklar-lecII.1 2
types of information systems
informal
evolve from patterns of human behavior (can be complex)
not formalized (i.e., designed)
rely on “word of mouth” (“the grapevine”)
manual
formalized but not computer based
historical handling of information in organizations, before computers (i.e., human
“clerks” did all the work)
some organizations still use aspects of manual IS (e.g., because computer systems are
expensive or don’t exist to relace specialized human skills)
computer-based
automated, technology-based systems
typically run by an “IT” (information technology) department within a company or
organization (e.g., ITS at BC)
cis20.2-spring2008-sklar-lecII.1 3
computer-based information systems
data processing systems (e.g., accounting, personnel, production)
office automation systems (e.g., document preparation and management, database
systems, email, scheduling systems, spreadsheets)
management information systems (MIS) (e.g., produce information from data, data
analysis and reporting)
decision support systems (DSS) (e.g., extension of MIS, often with some intelligence, allow
prediction, posing of “what if” questions)
executive information systems (e.g., extension of DSS, contain strategic modeling
capabilities, data abstraction, support high-level decision making and reporting, often have
fancy graphics for executives to use for reporting to non-technical/non-specialized
audiences)
cis20.2-spring2008-sklar-lecII.1 4
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Understanding Information Systems: Types, Characteristics, and Retrieval and more Study notes Information Systems in PDF only on Docsity!

cis20.2design and implementation of software applications IIspring 2008session # II.1information models and systems topics:^ •^ what is information systems?^ •^ what is information?^ •^ knowledge representation^ •^ information retrieval cis20.2-spring2008-sklar-lecII.^

1

what is information systems? • the field of information systems (IS)^ comprises the following: – a number of types of computer-based information systems – objectives – risks – planning and project management – organization – IS development life cycle – tools, techniques and methodologies – social effects – integrative models

cis20.2-spring2008-sklar-lecII.^

types of information systems • informal – evolve from patterns of human behavior (can be complex) – not formalized (i.e., designed) – rely on “word of mouth” (“the grapevine”) • manual – formalized but not computer based – historical handling of information in organizations, before computers (i.e., human“clerks” did all the work) – some organizations still use aspects of manual IS (e.g., because computer systems areexpensive or don’t exist to relace specialized human skills) • computer-based – automated, technology-based systems – typically run by an “IT” (information technology) department within a company ororganization (e.g., ITS at BC)

cis20.2-spring2008-sklar-lecII.^

3

computer-based information systems • data processing systems (e.g., accounting, personnel, production) • office automation systems (e.g., document preparation and management, databasesystems, email, scheduling systems, spreadsheets) • management information systems (MIS) (e.g., produce

information^ from^ data, data analysis and reporting) • decision support systems (DSS) (e.g., extension of MIS, often with some intelligence, allowprediction, posing of “what if” questions) • executive information systems (e.g., extension of DSS, contain strategic modelingcapabilities, data abstraction, support high-level decision making and reporting, often havefancy graphics for executives to use for reporting to non-technical/non-specializedaudiences) cis20.2-spring2008-sklar-lecII.^

why do organizations have information systems? • to make operations efficient • for effective management • to gain a competitive advantage • to support an organization’s long-term goals cis20.2-spring2008-sklar-lecII.^

5

IS development life cycle • feasibility study • systems investigation • systems analysis • systems design • implementation • review and maintenance

cis20.2-spring2008-sklar-lecII.^

social effects of IS

-^ change management •^ broad implementation (not just about software) •^ education and training •^ skill change •^ societal and cultural change cis20.2-spring2008-sklar-lecII.^

7

integrative models

-^ computers in society •^ the internet revolution (internet 2, web 2.0) •^ “big brother” •^ ubiquitous computing cis20.2-spring2008-sklar-lecII.^

meaning versus form

-^ is the form of information the information itself? or another kind of information? •^ is the meaning of a signal or message the signal or message itself? •^ representation (from Norman 1993)^ –^ why do we write things down?^ ∗^ Socrates thought writing would obliterate serious thought^ ∗^ sound and gestures fade away^ –^ artifacts help us reason^ –^ anything not present in a representation can be ignored (do you agree with that?)^ –^ things left out of a representation are often those things that are hard to represent, orwe don’t know how to represent them cis20.2-spring2008-sklar-lecII.^

13

The Library of Babel, by Jorge Luis Borges (1941) • a story about a universe comprised of an indefinite (possibly infinite) number of hexagonalrooms, each containing walls of bookshelves that contain books which, in turn contain allpossible combinations of letters • is this information? data? knowledge? intelligence? • how is the internet like (or unlike) the library of babel? cis20.2-spring2008-sklar-lecII.^

information theory

-^ Claude Shannon, 1940’s, IBM •^ studied communication and ways to measure information •^ communication^ = producing the same message at its destination as at its source •^ problem:^ noise^ can distort the message •^ message is^ encoded^ between source (transmitter) and destination (receiver) cis20.2-spring2008-sklar-lecII.^

15

communication theory

-^ many disciplines: mass communication, media, literacy, rhetoric, sociology, psychology,linguistics, law, cognitive science, information science, engineering, medicine... •^ human communication theory:^ do you understand what I mean when I say something? •^ what does it mean to say a message is received? is received the same as understood? •^ the^ conduit metaphor •^ meaning: syntactic versus semantic cis20.2-spring2008-sklar-lecII.^

information theory today • total annual information production including print, film, media, etc is between 1-2 18 Exabytes ( 10 ) per year • how to we organize this??? • and remember, it accumulates! • information hierarchy:data → information → knowledge^ →^ intelligence

cis20.2-spring2008-sklar-lecII.^

17

information retrieval

-^ information^ organization^ versus

retrieval

-^ organization:categorizing and describing information objects in ways that people can use them whoneed to use them •^ retrieval:being able to find the information objects you need when you need them •^ two key concepts:^ –^ precision: did I find what I wanted?^ –^ recall: how quickly did I find it? •^ ideally, we want to maximize both precision and recall—this is the primary goal of the fieldof^ information retrieval (IR) cis20.2-spring2008-sklar-lecII.^

IR assumptions

-^ information remains static •^ query remains static •^ the value of an IR solution is in how good the retrieved information meets the needs of theretriever •^ are these good assumptions?^ –^ in general, information does not stay static; especially the internet^ –^ people learn how to make better queries •^ problems with standard model on the internet:^ –^ “answer” is a list of hyperlinks that then need to be searched^ –^ answer list is apparently disorganized cis20.2-spring2008-sklar-lecII.^

19

IR process

-^ IR is iterative •^ IR doesn’t end with the first answer (unless you’re “feeling lucky”...) •^ because humans can recognize a partially useful answer; automated systems cannot alwaysdo that •^ because human’s queries change as their understanding improves by the results of previousqueries •^ because sometimes humans get an answer that is “good enough” to satisfy them, even ifinitial goals of IR aren’t met cis20.2-spring2008-sklar-lecII.^

zone search

-^ a “zone” is an identified region within a document •^ typically the document is “marked up” before you search •^ content of a zone is free text (unlike parametric fields) •^ zones can also be indexed •^ example: search for a book with certain keyword in the title, last name in author and topicin body of document •^ does this make the web a database? not really (which you’ll see when we get intodatabase definitions next week) cis20.2-spring2008-sklar-lecII.^

25

scoring and ranking

-^ search results can either be^ Boolean^ (match or not) or^ scored -^ scored results attempt to assign a quantitative value to how good the result is •^ some web searches can return a

ranked^ list of answers, ranked according to their score

-^ some scoring methods:^ –^ linear combination of zones (or fields)^ –^ incidence matrices cis20.2-spring2008-sklar-lecII.^

linear combination of zones • assign a weight to each zone (or field) and evaluate: score = 0. 6 ∗ (Brooklyn ∈^ neighborhood) + 0.^5 ∗^ (3^ ∈

bedrooms) + 0.^4 ∗^ (1000 =^ price

-^ problem:it is frequently hard for a user to assign a weighting that adequately or accurately reflectstheir needs/desires cis20.2-spring2008-sklar-lecII.^

27

incidence matrices

-^ recall^ = document (or a zone or field in the document) is a binary vector

v X ∈ { 0 , 1 }

-^ query^ is a vector •^ score^ is overlap measure:^ |X

∩^ Y^ |

-^ example:^ Julius Caesar^ The Tempest

Hamlet^ Othello^ Macbeth Antony^1

Brutus^1

Caesar^1

Calpurnia^1

Cleopatra^0

score^ is sum of entries row (or column, depending on what the query is) cis20.2-spring2008-sklar-lecII.^

-^ problem:^ overlap measure^ doesn’t consider:^ –^ term frequency (how often does a term occur in a document)^ –^ term scarcity in collection (how infrequently does the term occur in all documents inthe colletion)^ –^ length of documents searched •^ what about^ density? if a document talks about a term more, then shouldn’t it be a better match? •^ what if we have more than one term?this leads to^ term weighting cis20.2-spring2008-sklar-lecII.^

29

term weighing

-^ in previous matrix, instead of

0 or^1 in each entry, put the^ number of occurrences^ of each term in a document • this is called the “bag of words” (multiset) model • problem:^ –^ score is based on syntactic count but not on semantic count^ –^ e.g.:^ The Red Sox are better than the Yankees.^ is the same as^ The Yankees are better than the Red Sox.^ (well, only in this example...) • count^ versus^ frequency^ –^ search for documents containing “ides of march”^ –^ Julius Caesar has 5 occurrences of “ides”^ –^ No other play has “ides”^ –^ “march” occurs in over a dozen plays^ –^ All the plays contain “of” cis20.2-spring2008-sklar-lecII.^

-^ By this scoring measure, the top-scoring play is likely to be the one with the most“of”s — is this what we want? • NOTE that in the IR literature, “frequency” typically means “count” (not really“frequency” in the engineering sense, which would be count normalized by documentlength...) • term frequency (tf) –^ somehow we want to account for the length of the documents we are comparing • collection frequency (cf) –^ the number of occurrences of a term in a collection (also called

corpus)

-^ document frequency (df)^ –^ the number of documents in a collection (corpus) containing the term •^ tf x idf or tf.idf^ –^ tf = term frequency^ –^ idf = inverse document frequency; could be

1 /df^ , but more commonly computed as:^ ^ n^  idf= logi dfi cis20.2-spring2008-sklar-lecII.^

-^ “weight” of term 31 i^ occurring in document^ d^ (w

) is then:i,d w=^ tf×^ idfi,d^ i,d^ i^ =^ tf×^ log(n/df)i,d^ i where tf= frequency of term^ i^ in documenti,d^

d n^ = total number of documents in collection df= number of documents in collection that contain termi^

i

-^ weight increases with the number of occurrences within a document –^ weight increases with the rarity of the term across the whole collection • so now we recompute the matrix using the

wformula for each entry in the matrix, andi,d^ then we can do our ranking with a query cis20.2-spring2008-sklar-lecII.^