



























Studia grazie alle numerose risorse presenti su Docsity
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Prepara i tuoi esami
Studia grazie alle numerose risorse presenti su Docsity
Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity
Trova i documenti specifici per gli esami della tua università
Preparati con lezioni e prove svolte basate sui programmi universitari!
Rispondi a reali domande d’esame e scopri la tua preparazione
Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali
Studia con prove svolte, tesine e consigli utili
Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te
Esplora i documenti più scaricati per gli argomenti di studio più popolari
Ottieni i punti per scaricare
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
appunti top in basics in data management
Tipologia: Appunti
1 / 35
Questa pagina non è visibile nell’anteprima
Non perderti parti importanti!




























Business process: sequence of business activities aimed at producing a product or service involving multiple resources: material, organizational, informational Information system: set of information managed by business processes Data assets: raw material with which information is produced Set of procedures: for information acquisition, processing, production Set of human resources: that oversee the procedures Set of tools and instruments: for storage and processing of information An information system is the set of components of an organization designed to: acquire, process, store, retrieve, share, transmit information. A computer system is the technology supporting the information system: hardware, software, databases and management systems, communication networks. EXAMPLE OF INFO AND COMPUTER SYSTEM Municipality equipped with information and computer system for detecting the level of smog
Stored persistently Related to a reality of interest Serving a given organization Database A database is the set of information associated with the collections of data: - related to each other – equipped with an appropriate description. It is: - a single, large data repository – shared within the enterprise by all applications and users – persistent, i.e., with much longer life than management procedures – a tool that allows you to always work on a consistent state of the data. Database description The database must maintain its own description This refers to a catalog or dictionary containing a set of data called metadata that is used to describe the data itself All this is achieved through a software layer called DBMS, that manages all data in an integrated manner, ensuring that operations are carried out efficiently and effectively Definition of DBMS It a set of programs that allows to: Define: specify types, structures, and constraints on data Manipulate: insert, delete, update, retrieve data Check: control access to data by ensuring protection from failures, unwanted access the database. A DBMS, therefore, makes it easier for users to use their database. Before the advent of DBMSs, a data store consisted of a set of files, and all operations and information management logic were the responsibility of the applications that interacted with the store. With the introduction of DBMSs, applications that interact with the database are greatly simplified. Managing a phone book Registration of the countless names of friends with their addresses, phone numbers, cell phone numbers… Personal approach: managing with the appropriate programs written in the preferred language all data access operations and ensuring their “persistence” in confidential archives Approach using a DBMS: a DBMS product (free, download form the Web) is used that allows data definition and management by classic SELECT, INSERT, DELETE, UPDATE operations. A database system is the set formed by a database and a DBMS. ANSI-SPARC three-level architecture One of the first general architecture proposals for database systems was introduced in 1971 by the Data Base Task Group (DBTG) formed by the Conference on Data Systems and Languages (CODASYL). The Standards Planning and Requirement Committee (SPARC) of the American National Institute. (ANSI) proposed a similar model (since then called the ANSI-SPARC architecture) whose
ATOMICITY: is the so-called “all or nothing” property: a transaction is atomic if it is executed in its entirety or not at all CONSISTENCY: a transaction is a transformation of one consistent state of the database to another consistent state. A DBMS, in particular, must ensure that all constraints defined on the database are satisfied ISOLATION: transactions must be executed independently of each other. This means that partial effects of incomplete transactions should not be visible to other transactions DURABILITY: the effects of a transaction that is terminated by a “commit” must be permanently recorded in the database and never lost for any reason. Features of DBMS:
resources) are used to serialize concurrent transactions, preventing them from anomalies in accessing information.
Suppose a bank, with offices and branches spread throughout the country, is equipped with a video surveillance system to monitor and control sensitive areas in order to detect dangerous events, such as possible robberies. The steps for the production and management of information are: A set of cameras, equipped with processors capable of providing them with “intelligence” and placed at various locations in the venues, continuously acquires information about the observed scene (information acquisition) These data are then sent via dedicated communications network to a central processor or server that stores them in specialized digital archives for processing information of a video nature (information storage) Data are then processed in real time by special software programs automatically (without any human intervention) or semi-automatically, and then analysed by special operators (information processing) If anomalies are detected, operators can check for the actual presence of suspicious occurrences and report alarm information in a timely manner to security managers or directly to law enforcement (information communication) As can be seen, the previously described information system provides decision support for the activities of security officers, who, thanks to the help of an almost completely computerized system, receive or not only the information to be analysed, but also the results of their processing, having continuous and immediate feedback on their actions. Specifically, depending on the level of alert automatically generated by a software program, the operator could look at the video related to a certain area and check for suspicious events (such as robbery in progress), and, in case of danger alert law enforcement agencies. Note that in this case it is unthinkable that the IT system does not exist, as its absence would greatly impact the entire business organization. Each headquarters would have to be staffed with a certain number of security personnel to guard all sensitive areas, the communication of information to headquarters would be much more complicated and slow and would have to be done with rules to be defined, the time for law enforcement agencies to intervene would be lengthened resulting in late. Information systems in public administration In Italy, examples of central public administrations that have adopted completely innovative solutions include: the Ministry of Agriculture and Forestry •the National Health Information System •the Public Education Information System •the Ministry of Justice -which is in the process of computerizing the criminal and civil areas for everything related to the trial process Document management: a very important aspect to highlight in PA information systems is undoubtedly the presence of systems for document management. Document management within administrations presents problems and critical issues of an organizational, technological and archival nature that require a systematic redefinition of structures, responsibilities, appropriate IT architectures, document flow management models and preservation methods. The introduction of a document management system within a public administration must, in other words, be harmoniously and functionally included as part of an information system supporting the administration’s institutional
activities. A document management system consists of the set of documents produced and acquired for the institutional purposes of an agency or administration system, i.e., the rules, procedures and resources necessary for the formation, organization, maintenance, retrieval, use and preservation of documents. To sum up, the macro-objectives to be achieved with a document management system are: The production/acquisition of reliable documents for legal and administrative purposes The intake of processing of documents The organization and maintenance of documentary production in an orderly manner consistent with the functions performed The transmission and preservation of authentic documents, i.e., intact and of certain and identified provenance The speed and efficiency of retrieval within the administrative work performed Hospital information systems A hospital information system is an integrated information system designed to manage all aspects of a hospital, whether administrative and financial, clinical, medical or research. This terminology also refers to the document management systems required in the health care organization as well as the technological infrastructure for processing and transmitting the information processed. Usually, the system is divided into subsystems related to the various medical areas. At present, there are three different subsystems commonly found: The Hospital Information System proper The Radiological Information System (RIS) The Picture Archiving and Communication System (PACS) From the perspective of information managed by the system, there are three main classes of data: Those related to patients Those related to the activities Those related to resources A hospital information system, designed after careful analysis of the needs of the health care facility, is a tool aiding diagnostic decision-making and organizational activities. All of this requires that HIS, like all modern ISs, virtually presents itself as an integrated system that allows different information to be stored, accessed and shared. It is assumed that an orthopaedic trauma hospital is equipped with an information system t manage patient admissions. The steps for the production and management of information in such case are: A set of hospital receptionists acquire via telephone the request for admissions from patients (information acquisition) Patient information is then entered by the employees themselves via terminals within the hospital computer system and, in particular, stored on a special central server as an integral part of computerized medical records (information storing) The archived information is, then, collected and analysed by the head physician and head nurse o the relevant department (information processing)
Analysis of air traffic information and possible problems impacts the decision- making activities of airport managers In particular, depending on the possible problems that may be encountered (unavailability of some runways), managers might make the decision to cancel or delay one or more flights or to wait for the situation to evolve (runway restoration) and simply delay flights.
functions (which are themselves objects). The use of operators is relatively intuitive. An R function may be sketched as follows: The arguments can be objects (“data, formulae, expressions, …), some of which could be defined by default in the function. These default values may be modified by the user specifying options. An R function may require no argument: either all arguments are defined by default (and their values can be modified with the options), or no argument has been defined in the function. All the actions of R are done on objects stored in the active memory of the computer: no temporary files are used. The readings and writings of files are used for input and output of data and results (graphics, …) The user executes the functions via some commands The results are displayed directly on the screen, stored in an object, or written on the disk (particularly for graphics) Since the results are themselves objects, they can be considered as data and analysed as such Data files can be read from the local disk or form a remote server through internet The functions available to the user are stored in a library localized on the disk in a directory called R HOME/library (R HOME is the directory where R is installed) The directory contains packages of functions, which are themselves structured in directories The package named base in a way the core of R and contains the basic functions of the language, particularly, for reading and manipulating data Each package has a directory called R with a file named like the package (for instance, for the package base, this is the file R HOME/library/base/R/base) This file contains all the functions of the of the package One of the simplest commands is to type the name of an object to display its content For instance, if an object n contents the value 10: The digit 1 within brackets indicates that the display starts at the first element of n This command is an implicit use of the function print and the above example is similar to print(n) (in some situations, the function print must be used explicitly, such as within a function or a loop) The name of an object must start with a letter (A-Z and a-z) and can include letters, digits (0-9), dots(.), and underscores (_).
We can avoid to display all these details with the option max.level = -
. To delete objects in the memory, we use the function rm: rm(x) deletes the object x, rm(x,y) deletes both the objects x et y, rm(list=ls()) deltes all the objects in memory. The same options mentioned for the function ls() can then be used to delete selectively some objects: rm(list=ls(pat”^m)) The on-line help of R gives very useful information on how to use the functions. Help is available directly for a given function, for instance: > ?lm will display, within R, the help page for the funcgtion lm() (linear model). The commands help(lm) and help(“lm”) have the same effect. The last one must be used to access help with non-conventional characters calling help opens a page (this depends on the operating system) with general information on the first line such as the name of the package, where is (are) the documented function(s) or operators. Then comes a title followed by sections which give detailed information The help in html format (read, e.g., with Netscape) is called by typing > help.start(). The search with keywords is also possible in R with the function help.search. The latter looks for a specific topic, given as a character string, in the help pages of all installed packages. For instance, help.search(“tree”) will display a list of the functions which help pages mention “tree”
Other modes exist but they do not represent data, for instance function or expression The length is the number of elements of the object To display the mode and the length of an object, one can use the functions mode and length respectively: Whatever the mode, missing the data are represented by NA (not available) A very large numeric value can be specified with an exponential notation: A value of mode character is input with double quotes “. It is possible to include this latter character in the value if it follows a backlash . The 2 characters altogether \” swill be treated in a specific way by some functions, such as cat for display on screen or write.table to write on the disk , Alternatively, variables of mode character can be delimited with single quotes (‘); in this case it is not necessary to escape double quotes with backlashes (but single quotes must be !)
The variants of read.table are useful since they have different default values: The function scan is more flexible than read.table. A difference is that it is possible to specify the mode of the variables, for example:
. Reads in the file data.dat 3 variables, the first is of mode character and the next 2 are of mode numeric. Another important distinction is that scan() can be used to create different objects, vectors, matrices, data frames, lists,… In the above ex, mydata is a list of three vectors. By default, that is if what is omitted, scan() created a numeric vector. If the data read do not correspond to the mode(s) expected (either by default, or specified by what), an error message is returned.
The function read.fwf can be used to read in a file some data in fixed withdt format:
. The options are the same for read.table() except withds which specifies the width of the fields (buffersize is the maximum number of lines read simultaneously). For example, if a file name data.txt has the data indicated on the right, one can read the data with the following command: The function write.table writes in a file an object, typically a data frame but this could be another kind of object (vector, matrix, …). The arguments and options are:
it is useful in statistics to be able to generate random data, and R can do it for a larger number of probability density functions. These functions are of the for rfunc(n,p1,p2,…), where func indicates the probability distrivution, n the number of data generated and p1,p2,… are the values of the parameters of the distribution. The above table gives the details of each distribution and possible default values (if none default values is indicated, it must be specified by the user). Most of these functions have counterparts obtained by replacing the letter r with d, p or q to get respectively, the probability density (dfunc(x,…))), the cumulative probability density (pfunc(x,…)) and the value of quantile (qfunc(p,…)) with 0 < p < 1). It is possible to create an object and specifying its mode, length, type, etc. One can, for, instance, create an “empty” object and then modify its elements successively which is more efficient than putting all its elements together with c(). It can also be very convenient to create objects form other. For example, to fir a series of models, it is simple to put the formulae in a list and then to extract the elements successively to insert them in the function lm. The explicit construction of objects gives a better understanding of their structure and allows to go further in some notions previously mentioned. The function vector which has 2 arguments mode and length, created a vector which elements have a value depending on the mode specified as argument: 0 if numeric, FALSE if logical, or “” if character. The following functions have for single argument the length of the vector: numeric(), logical() and character(). A matrix is a vector with an additional attribute (dim) which is itself a numeric vector with length 2 and defines the numbers of row and columns of the matrix. It can be created with the function matrix. The option byrow indicates where the values given by
data must fill successively the columns (the default) or the rows (if TRUE). The option dimnames allows to give names to the rows and columns. We have seen that a data frame is created implicitly by the function read.table; it is also possible to create a data frame with the function data.frame. the vectors so included in the data frame must be of the same length, or if one of them is shorter, it is “recycled” a whole number of times. R offers a remarkable variety of graphics. To get an idea, one can demo(graphics) or demo(persp). It is not possible to detail here to possibilities of R in terms of graphics, particularly, since each graphical function has a large number of options making production of graphics very flexible. The result of graphical function cannot be assigned to an object but is sent to a graphical device which is a graphical window or a file. There are 2 kinds of graphical functions: the high-level plotting functions which create a new graph and the low-level plotting functions which add elements to an existing graph. The graphs produced with respect to graphical parameters which are defined by default and can be modified with the function par.