



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The ETL process model for a manufacture cells production line integration. It explains how the unified information in the proposed DM is loaded from Management Information Systems (MIS), with historic data from software development project. The document also discusses the creation of a Decision Support System (DSS) using Online Analytical Processing (OLAP) to support quantitatively the Process Areas of Measurement and Analysis (MA) of Capability Maturity Model Integration (CMMi).
Typology: Lecture notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




This article describes a Causal Analysis and Resolution of a problem for the elaboration of a Data Mart – DM specialized in Production Control and Planning – PCP. The unified information in the proposed DM is loaded form Management Information Systems – MIS, with historic data from software development project. To allow the integrated load from different data sources, it was developed a Extract, Transform and Load – ETL process, whose implementation has integrated data from the MIS of Function Point Analysis and Personal Software Process – PSP in a software factory. That allowed the construction of a Decision Support System – DSS over the constructed DM. This DSS allows the use of Online Analytical Processing – OLAP to support quantitatively the Process Areas of Measurement and Analysis – MA of Capability Maturity Model Integration – CMMi.
Researchers from TCS and Research Group of Software Engineering ( Grupo de Pesquisa de Engenharia de Software – GPES ) from the Brazilian Aeronautics Institute of Technology ( Instituto Tecnológico de Aeronáutica – ITA ), based on some Process Areas of Capability Maturity Model Integration – CMMI, created a process of Extract, Transform and Load – ETL to software factories specialized in Line Production Lines of Manufacture Cell – PLMC [1]. It was specified, dimensioned, projected and tested a process, by means of Causal Analysis and Resolution – CAR to integrate in a Data Mart information of executed software projects [2] from different data sources. The created Data Mart consolidates information from the MIS of software estimation by Function Point Analysis – FPA [3] and of MIS of Personal Software Process – PSP, keep in the corporation.
The objective is to allow the use of Online Analytical Processing – OLAP [4] to the Process Area of Measurement and Analysis – MA from Capability Maturity Model Integration – CMMI. The diverse concepts of software engineering and quality that was used in the construction of a Decision Support System – DSS using OLAP [5] formed the base to the developed Control Projects System. The Data Mart – DM was modeled in a way to supply a base to Knowledge Discovering in Databases – KDD [2], like Data Mining – DM. The SIG’s project followed the Service Oriented Architecture – SOA [6] to the development of the needed environment to the integration models. The work objective was to develop an ETL process with the technical requirements of SOA application to supply a DM of a PLCM.
ETL is a process methodology for Extracting, Transforming and Loading data from transactional databases to multidimensional databases, like Data Marts
be consolidated in technical reports in an automated way [10]. These reports can be stored and published to the department that supplies the base to the Process Area. The Decision Support System was purpose to the PLMC. It was developed to support the Quantitative Project Management – QOM Process Area - PA, described by Carnegie Mellon University as an integrated part of CMMI version 1.2 [11]. One of the interest areas of this research was the alternative solutions analysis for the data
study consequence was the ETL process customized prototyping to a DM for a PLMC.
The initial concern was to identify the dimensions to be modeled from the transactional MIS. It is required the multidimensional modeling of DW before the ETL process definition. After that, it was identified some alternatives to the resolution of automated generation project reports process, whose information would be provided by the DW modeled. To the execution of the automated ETL, it was necessary the Extraction activities planning, Transforming and Load. The first part of this ETL process involved data extraction from the MIS databases of FPA and PSP. After that, Transformation activities were critical to ensure that the data can be grouped and summarized like the DW model. Transformation usually involves rules application and functions over the raw data extracted from data sources to allow its compatibility with the model. The problem statement was: “To refine ETL methods to integration of MIS in DW of software engineering projects, which allowed the gauge of dimensioning by means of automated generation of technical reports” [12]. Data Marts are multidimensional databases generally specialized in one department of a company. When it is obtained data from more than one area, that interest to the general goals of the company, it can grow up to a Data Warehouse. So, DW can emerge from top-down or bottom-up approach. Top-Down is when a company creates a DW and after that goes to segmentation, dividing in minor areas, oriented by department subjects. In Bottom approach, the company prefers first create a Data Mart for most important or critical departments. That approach costs less than the DW and reduce risks. Then, when the Data Marts requirements are mature enough, the focus goes to their integration in a DW. The hardware and software infra-structure of DM and DW can be similar, but the data architecture can be significantly different. The SOA defined platform unified the needed requirements to the technological implementation of the business in a three-tier environment. In the database tier, it was adopted the Bottom-up strategy to obtain the DW
from the MIS FPA and PSP. The objective was to allow OLAP analysis to supply a tool for decision taking. It was chosen as OLAP tool the SQL Server 2000 that allowed all the process of collection, storage and publishing of the needed information. It was adopted Key Performance Indicators – KPIs to provide to the management quick insights about the projects situation. The identified problem was applied over the software engineering projects by means of a computational subsystem that allowed a new control and automation of MIS models, filled with manual data entries from distributed databases [13].
The process re-engineering was based with on monitoring data of the process. The CAR process applied has oriented the existing refinement collection data manipulation process and information about dimensioning and software control. The start point of the process was to understand the reason for dimensioning data tabulation lateness provided by FPA. From this point, it was initialized a process of improvement of this bottle-neck. In the data analysis, the company aimed to get a new level of knowledge on your own production. Each software development company knows how much of its resources to the execution of a specific contract. It is fundamental to strategic planning maintenance that projects, to be developed, are monitored and its limitations are known. With this strategic vision, the embedded problems in software development process can be identified and resolved on time. It is a fundamental point to justify the use of OLAP. The start point to the difficult found in a PLMC of TCS was a reverse engineering realization. Based on the concepts suggested by Sommerville, an analysis process must have a study object, and in this case it was the information automation process about the software dimensioning projects in a PLMC. The technical capability to recover information about the original project could be applied aiming to optimize the next projects [14]. For the enterprise continuous offering of projects and its survival in the competitive business, TCS had to analyze its competitors and observe their performance. The comparing with its own results has occurred to fundament a strategic decision in production lines to the products offering and services. The conscious competition administration loss allowed performance improvement re-planning activities. In this vision, the institution has performed processes revisions aiming to find a continuous improvement for its process, to revitalize your organizational projects. A commission of Research and Development – R&D analyzed objective indicators against the monitored Process Areas that
new indicator should be done until the point of working together with the previous system. The team of R&D needs to avail themselves of all artifacts related to the process to ensure the correct record of the occurrence, when the team stabilize the process and indicator does not present the problem in an activity similar to the replacement should be implemented successfully. 5 - For complex cases in which the solution is beyond the material and intellectual resources of the occasion, a research stage to obtain this knowledge is central to solving the problem. Conducted research and made the knowledge acquisition, a new prototype should be done and analyzed again by their method. 6 - The team of R&D should follow the development team to make sure that the implementations were carried out according to the determined.
7 - During development, the R&D team can verify if the implementation and the requirements have been met in accordance with the request. 8 - The new process has been tested in operation and had to stay under observation up to the validation after batteries of tests in production. 9 - To prevent some collected knowledge part was lost, and depending on complexity, the project type and risk involved, the R&D teams have created a technical report on the component's project. But what is the advantage achieved by this reverse engineering process as sophisticated? The advantage was that with this scientific method used, the problems are replaced by a formal treatment as well as its solution. The first part of the problem identification of system studied was the classification of the patterns of problem presented in Table2 [1]. Table 2: Classification Generic Standards Issues
This action was essential to that records' observations obtained allowing an analysis applied later. The assembly of the DM with data basis distributed with APF estimates from projects was the first step to identify the behavior of
LPCM. In this study the data collected came from more than one basis by estimates of APF project to be processed individually in order to obtain a specialized DM. The Table 2 the team of P&D found that the patterns of problems identified in this analysis (CAR) was the analysis pattern. Based in the distributed projects databases the production teams began manual analysis activities for an APF atmosphere parameters definition. In created DM they were accomplished manual consolidations that originated difficulties in an acquisition maintenance of these registrations. After the process focus bottle neck identification, P&D team solved mapping process. This resolution is presented in the Picture 3.
Table 3: Process for Identifying Patterns of Problems and Solutions
The MIS identification scenario needed to be supported by a technology that had the capability to integrate flocks distributed data. In research in the database area, methods that suited the needs identified by the analysis were met by the ETL method. Complementing the reasoning line presented during the developing ETL component process with an approach to be chosen by the team of R&D. Since the beginning of the process sought the establishment of a working prototype.
Generic Pattern Classification Problem Architectural: The structural organization of systems hardware and software. Project: The organization of activities around tasks for the performance of a product. Process: The organization in an orderly manner in the task within an activity. Programming: Introduces the programming solutions to implement the requirements requested by the client. example - the choice of language. Analysis: The process of verification and validation of documents and records to ascertain whether the specification of a client was treated in accordance with the contract.
Identification Process
Issues Standards Solutions Standards
ETL was the methodology defined.
Were identified a pattern of structured coding ETL component.
The parameterization of the data structures of the two distributed databases to be integrated with the data mart.
Was developed an ETL standard for the integration of new distributed databases.
The MIS integration model has established specifications for the construction of an SQL Server 2000 ETL component, which contains the MIS of APF and PSP in order to meet the subjects terms studied in the program. Were organized activities, processes, methods, techniques and software engineering tools to integrate the solution according to guidelines development project [11]. The quality of products and processes will be used in the project with the quality software engineering standards along with the CMMi guidelines [1]. One research products was a CAR methodology for ETL components creation.
To compose an instrument new, there are following steps to create the prototype of analysis' instrument, so presented himself process defined that enable the development, though is possible future considerations. The proposed study has the following steps in Table 4: To implement the ETL component was necessary a theoretical and practice study on the technology, this step is part of the learning process. Based on the data structure, the modeling provided an analysis of ETL could be applied to determine the requirements integration specification. One of the requirements were functional ETL verification and validation. Table 5 presents the main testing activities process of User. For the data collection and technical information necessary for the development of the ETL component used the Delphi method [1]. The test plan for validation of the component passed the ETL functionality needed to load the instrument. The test performed was derived standards for software products shown in Table 5. This table shows the main activities of the testing process of User. For the data collection and technical information necessary for the development of ETL component we used the Delphi method [22]. The test plan for validation of the ETL component passed by functionality needed to load the instrument. Table 4. Process of development of ETL. Development process of the ETL component 1 - Formulation of the problem. 2 - Planning research. 3 - Elaboration of ETL component. 4 - Load Testing. 5 - Verification of the data. 6 - Data analysis and interpretation of data and results. 7 - Presentation of data and results. 8 - Verification and Validation.
The Picture 5 presents main activities of user's test process. For data rising, it was necessary technical information for component ETL development used - if method Delphi [22].
The plan of tests for a component ETL validation went by necessary functionalities for instrument load. The accomplished test was derived of norms for software products shown in the Picture 5. This picture presents the main process of user's test activities. The tests plan for the component ETL validation went by necessary functionalities to load this instrument.
Table 5. Criteria testing for product.
The test performed was derived for software products standards shown in Table 5. After making the selection of test's types' criteria's for the ETL component, began the process validation and verification described in Table 6.
Table 6. ETL Validation and Verification Process. Process ETL verification and validation
Test criteria for product
in DM to factory's software from LPCM. Represented by a process defined the PSP. Without the formal process presented, the methodology used for line production becomes uneconomical to maintain the software factory business. The ETL created such standard was used SQL Server 2000 and the modeling visual applied, generated more than 1400 lines of code (LOC) in SQL Server, performed on a component. The ETL methodology presented requirements for the assembly of DM the SP and DM of LPCM. The creation of this system is advisable as to the manual activities use, consistency and your gain is realized in that production use is obtained on a large scale. To continue this research is recommended: 1 - The use of concepts of ETL, 2 - Understanding the methodologies of DM and DW projects, 3 - Knowledge of PCP for CMMi, TSP and PSP for use on production lines software such as manufacturing cells LPCM [1] [9] [12]. For future work, it is suggested that research results are applied to the use of data mining in, for example, in Operational Research (Pesquisa Operacional - PO).
The authors of this research would like to thank: 1 - When GPES for support the development of methodological research; 2 - ITA by the availability of resources for infrastructure software; 3 - Tata Consultancy Services (TCS) the possibility of partnerships with the productive sectors academics.
[1] Montini, Denis Ávila, Modelo de indicadores de risco para o orçamento de componentes de software para célula de manufatura. / Denis Ávila Montini. 360p. Dissertação (Mestrado) em Engenharia de Produção – Universidade Paulista - (2005).
[2] Laundon, Kenneth C. & Laudon, Jane P. 7 ª Edição.
[3] Albrecht, Allan J. Function Points as a Measure of Productivity, Act as do 53 rd meeting of GUIDE International Corp. Guidance for users of integrated data processing equipment conference, Dallas, 1981.
[4] Cunha A. M, “CE - 240 Projeto de Sistema de Banco de Dados”, ITA – Instituto de Tecnologia da Aeronáutica. Disponível em: http://www. ita. br/ ~cunha, Acesso em: Dezembro de 2007.
[5] Cunha A. M, “CE - 245 Sistema de Informação”, ITA
[6] Newcomer, Eric; Lomow, Greg. “ Understanding SOA with Web Services ”. Addison Wesley. ISBN 0 - 321 - 18086 - 0. Edição, 2005.
[7] Kimball, R., The Data Warehouse ETL Toolkit. 1ª Edição, Wiley, 2004.
[8] Humphrey, Watts. Introducing Personal Software Process (PSP) and Team Software Process (TSP) into an Organization August /1997. Disponível no site: <http://www. sei. cmu. edu> Maio /2009.
[9] Montini, Denis Ávila; Moreira, Gabriel De Souza; Vieira, Luiz Alberto; Battaglia, Danilo; Gnatiuc, Carlos Eduardo; Cunha, Adilson Marques Da; Estudo de caso de uma estratégia de integração de middleware para um serviço SOA de gerenciamento e controle de fábrica de software: TCS – Tata Consultancy Services – Intranet website de Base de Conhecimento Corporativa KnowMax: TACTICS Iberoamerica 2007, Brasil, São Paulo, SP 25 - 26/Outubro/2007.
[10] MAIN, Jeremy. Quality wars: the triumphs and defeats of American business. New York: The Free Press, 1994.
[11] CMMi, Version 1. 2 - CMMI - DEV, V1. 2, CMU/SEI - 2006 - TR - 008 - ESC - TR - 006 - 008 - Improving processes for better products. Agosto de 2006.
[12] MARCONDES, F. S., Vega I S, Dias L A V. A Formal Use Case Type Description by Model Transformation: ICSSEA, 2008b.
[13 ] Patterns and Applications 1996.
[14]Davenport, Thomas H., Reengenharia de Processos. Rio de Janeiro: Campus, 1999.
[15] Hammer, Michael M. ; Champy, J. Reengenharia: revolucionando a empresa em função dos clientes, da concorrência e das grandes mudanças da gerência. Tradução de Ivo Korytowski. - Rio de Janeiro : Campus, 1994.