Bussines Managements, Lecture notes of Knowledge Management

The ETL process model for a manufacture cells production line integration. It explains how the unified information in the proposed DM is loaded from Management Information Systems (MIS), with historic data from software development project. The document also discusses the creation of a Decision Support System (DSS) using Online Analytical Processing (OLAP) to support quantitatively the Process Areas of Measurement and Analysis (MA) of Capability Maturity Model Integration (CMMi).

Typology: Lecture notes

2022/2023

Available from 06/29/2023

allan_mith
allan_mith 🇺🇸

1 document

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ETL Process Model for a Manufacture Cells Production Line Integration
Harison Pereira Bila de Carvalho
TCS – Tata Consultancy Services
- Brazil
Danilo Battaglia
TCS – Tata Consultancy
Services - Brazil
Denis Ávila Montini
TCS – Tata Consultancy
Services - Brazil
Gabriel de Souza Pereira Moreira
ITA – Aeronautics Institute of
Technology - Brazil
Luiz Alberto Vieira Dias
ITA – Aeronautics Institute of
Technology - Brazil
Paulo Marcelo Tasinaffo
ITA – Aeronautics Institute
of Technology - Brazil
Abstract
This article describes a Causal Analysis and Resolution of
a problem for the elaboration of a Data Mart – DM
specialized in Production Control and Planning – PCP.
The unified information in the proposed DM is loaded
form Management Information Systems – MIS, with
historic data from software development project. To allow
the integrated load from different data sources, it was
developed a Extract, Transform and Load – ETL process,
whose implementation has integrated data from the MIS
of Function Point Analysis and Personal Software
Process – PSP in a software factory. That allowed the
construction of a Decision Support System – DSS over the
constructed DM. This DSS allows the use of Online
Analytical Processing – OLAP to support quantitatively
the Process Areas of Measurement and Analysis – MA of
Capability Maturity Model Integration – CMMi.
Key Words: FPA, ETL, OLAP, Data Mart, CMMi.
1. Introduction
Researchers from TCS and Research Group of
Software Engineering (Grupo de Pesquisa de Engenharia
de Software – GPES) from the Brazilian Aeronautics
Institute of Technology (Instituto Tecnológico de
Aeronáutica – ITA), based on some Process Areas of
Capability Maturity Model Integration – CMMI, created a
process of Extract, Transform and Load – ETL to
software factories specialized in Line Production Lines of
Manufacture Cell – PLMC [1].
It was specified, dimensioned, projected and tested a
process, by means of Causal Analysis and Resolution –
CAR to integrate in a Data Mart information of executed
software projects [2] from different data sources. The
created Data Mart consolidates information from the MIS
of software estimation by Function Point Analysis – FPA
[3] and of MIS of Personal Software Process – PSP, keep
in the corporation.
The objective is to allow the use of Online Analytical
Processing – OLAP [4] to the Process Area of
Measurement and Analysis – MA from Capability
Maturity Model Integration – CMMI. The diverse
concepts of software engineering and quality that was
used in the construction of a Decision Support System –
DSS using OLAP [5] formed the base to the developed
Control Projects System.
The Data Mart – DM was modeled in a way to supply
a base to Knowledge Discovering in Databases – KDD
[2], like Data Mining – DM.
The SIG’s project followed the Service Oriented
Architecture – SOA [6] to the development of the needed
environment to the integration models.
The work objective was to develop an ETL process
with the technical requirements of SOA application to
supply a DM of a PLCM.
2. Research Context
ETL is a process methodology for Extracting,
Transforming and Loading data from transactional
databases to multidimensional databases, like Data Marts
– DM or Data Warehouses – DW. The focus of a DW is
to store the information in a way that allows Decision
Support for enterprise business with high-performance.
The ETL phase is the longest and hardest of the
construction of a DW [7]. It can be used to uniform and
integrate the diverse data sources from a PLMC, which
have a systematic process to register the behavior and
software results engineering teams [8] [9] [10]. The ETL
can help the decision and administration process of an
enterprise together with a tool set of methodological
foundation to identification and analysis. In our case
study, ETL has collaborated in the calibration of
dimensioning variables of PLMC projects, by means of
data loading from FPA repository [3].
The creation of a Decision Support System aimed to ease
the knowledge discovery and allowed the identification of
patterns doesn’t easily perceived by manual analysis
through OLAP. The most relevant patterns and trends can
pf3
pf4
pf5

Partial preview of the text

Download Bussines Managements and more Lecture notes Knowledge Management in PDF only on Docsity!

ETL Process Model for a Manufacture Cells Production Line Integration

Harison Pereira Bila de Carvalho

TCS – Tata Consultancy Services

- Brazil

Danilo Battaglia

TCS – Tata Consultancy

Services - Brazil

Denis Ávila Montini

TCS – Tata Consultancy

Services - Brazil

Gabriel de Souza Pereira Moreira

ITA – Aeronautics Institute of

Technology - Brazil

Luiz Alberto Vieira Dias

ITA – Aeronautics Institute of

Technology - Brazil

Paulo Marcelo Tasinaffo

ITA – Aeronautics Institute

of Technology - Brazil

Abstract

This article describes a Causal Analysis and Resolution of a problem for the elaboration of a Data Mart – DM specialized in Production Control and Planning – PCP. The unified information in the proposed DM is loaded form Management Information Systems – MIS, with historic data from software development project. To allow the integrated load from different data sources, it was developed a Extract, Transform and Load – ETL process, whose implementation has integrated data from the MIS of Function Point Analysis and Personal Software Process – PSP in a software factory. That allowed the construction of a Decision Support System – DSS over the constructed DM. This DSS allows the use of Online Analytical Processing – OLAP to support quantitatively the Process Areas of Measurement and Analysis – MA of Capability Maturity Model Integration – CMMi.

Key Words: FPA, ETL, OLAP, Data Mart, CMMi.

1. Introduction

Researchers from TCS and Research Group of Software Engineering ( Grupo de Pesquisa de Engenharia de Software – GPES ) from the Brazilian Aeronautics Institute of Technology ( Instituto Tecnológico de Aeronáutica – ITA ), based on some Process Areas of Capability Maturity Model Integration – CMMI, created a process of Extract, Transform and Load – ETL to software factories specialized in Line Production Lines of Manufacture Cell – PLMC [1]. It was specified, dimensioned, projected and tested a process, by means of Causal Analysis and Resolution – CAR to integrate in a Data Mart information of executed software projects [2] from different data sources. The created Data Mart consolidates information from the MIS of software estimation by Function Point Analysis – FPA [3] and of MIS of Personal Software Process – PSP, keep in the corporation.

The objective is to allow the use of Online Analytical Processing – OLAP [4] to the Process Area of Measurement and Analysis – MA from Capability Maturity Model Integration – CMMI. The diverse concepts of software engineering and quality that was used in the construction of a Decision Support System – DSS using OLAP [5] formed the base to the developed Control Projects System. The Data Mart – DM was modeled in a way to supply a base to Knowledge Discovering in Databases – KDD [2], like Data Mining – DM. The SIG’s project followed the Service Oriented Architecture – SOA [6] to the development of the needed environment to the integration models. The work objective was to develop an ETL process with the technical requirements of SOA application to supply a DM of a PLCM.

2. Research Context

ETL is a process methodology for Extracting, Transforming and Loading data from transactional databases to multidimensional databases, like Data Marts

  • DM or Data Warehouses – DW. The focus of a DW is to store the information in a way that allows Decision Support for enterprise business with high-performance. The ETL phase is the longest and hardest of the construction of a DW [7]. It can be used to uniform and integrate the diverse data sources from a PLMC, which have a systematic process to register the behavior and software results engineering teams [8] [9] [10]. The ETL can help the decision and administration process of an enterprise together with a tool set of methodological foundation to identification and analysis. In our case study, ETL has collaborated in the calibration of dimensioning variables of PLMC projects, by means of data loading from FPA repository [3]. The creation of a Decision Support System aimed to ease the knowledge discovery and allowed the identification of patterns doesn’t easily perceived by manual analysis through OLAP. The most relevant patterns and trends can

be consolidated in technical reports in an automated way [10]. These reports can be stored and published to the department that supplies the base to the Process Area. The Decision Support System was purpose to the PLMC. It was developed to support the Quantitative Project Management – QOM Process Area - PA, described by Carnegie Mellon University as an integrated part of CMMI version 1.2 [11]. One of the interest areas of this research was the alternative solutions analysis for the data

universe that a PLMC needs to operate [12][13]. The

study consequence was the ETL process customized prototyping to a DM for a PLMC.

3. Problem Definition

The initial concern was to identify the dimensions to be modeled from the transactional MIS. It is required the multidimensional modeling of DW before the ETL process definition. After that, it was identified some alternatives to the resolution of automated generation project reports process, whose information would be provided by the DW modeled. To the execution of the automated ETL, it was necessary the Extraction activities planning, Transforming and Load. The first part of this ETL process involved data extraction from the MIS databases of FPA and PSP. After that, Transformation activities were critical to ensure that the data can be grouped and summarized like the DW model. Transformation usually involves rules application and functions over the raw data extracted from data sources to allow its compatibility with the model. The problem statement was: “To refine ETL methods to integration of MIS in DW of software engineering projects, which allowed the gauge of dimensioning by means of automated generation of technical reports” [12]. Data Marts are multidimensional databases generally specialized in one department of a company. When it is obtained data from more than one area, that interest to the general goals of the company, it can grow up to a Data Warehouse. So, DW can emerge from top-down or bottom-up approach. Top-Down is when a company creates a DW and after that goes to segmentation, dividing in minor areas, oriented by department subjects. In Bottom approach, the company prefers first create a Data Mart for most important or critical departments. That approach costs less than the DW and reduce risks. Then, when the Data Marts requirements are mature enough, the focus goes to their integration in a DW. The hardware and software infra-structure of DM and DW can be similar, but the data architecture can be significantly different. The SOA defined platform unified the needed requirements to the technological implementation of the business in a three-tier environment. In the database tier, it was adopted the Bottom-up strategy to obtain the DW

from the MIS FPA and PSP. The objective was to allow OLAP analysis to supply a tool for decision taking. It was chosen as OLAP tool the SQL Server 2000 that allowed all the process of collection, storage and publishing of the needed information. It was adopted Key Performance Indicators – KPIs to provide to the management quick insights about the projects situation. The identified problem was applied over the software engineering projects by means of a computational subsystem that allowed a new control and automation of MIS models, filled with manual data entries from distributed databases [13].

4. A software problem CAR

The process re-engineering was based with on monitoring data of the process. The CAR process applied has oriented the existing refinement collection data manipulation process and information about dimensioning and software control. The start point of the process was to understand the reason for dimensioning data tabulation lateness provided by FPA. From this point, it was initialized a process of improvement of this bottle-neck. In the data analysis, the company aimed to get a new level of knowledge on your own production. Each software development company knows how much of its resources to the execution of a specific contract. It is fundamental to strategic planning maintenance that projects, to be developed, are monitored and its limitations are known. With this strategic vision, the embedded problems in software development process can be identified and resolved on time. It is a fundamental point to justify the use of OLAP. The start point to the difficult found in a PLMC of TCS was a reverse engineering realization. Based on the concepts suggested by Sommerville, an analysis process must have a study object, and in this case it was the information automation process about the software dimensioning projects in a PLMC. The technical capability to recover information about the original project could be applied aiming to optimize the next projects [14]. For the enterprise continuous offering of projects and its survival in the competitive business, TCS had to analyze its competitors and observe their performance. The comparing with its own results has occurred to fundament a strategic decision in production lines to the products offering and services. The conscious competition administration loss allowed performance improvement re-planning activities. In this vision, the institution has performed processes revisions aiming to find a continuous improvement for its process, to revitalize your organizational projects. A commission of Research and Development – R&D analyzed objective indicators against the monitored Process Areas that

new indicator should be done until the point of working together with the previous system. The team of R&D needs to avail themselves of all artifacts related to the process to ensure the correct record of the occurrence, when the team stabilize the process and indicator does not present the problem in an activity similar to the replacement should be implemented successfully. 5 - For complex cases in which the solution is beyond the material and intellectual resources of the occasion, a research stage to obtain this knowledge is central to solving the problem. Conducted research and made the knowledge acquisition, a new prototype should be done and analyzed again by their method. 6 - The team of R&D should follow the development team to make sure that the implementations were carried out according to the determined.

4.1.3 Homologation Stage:

7 - During development, the R&D team can verify if the implementation and the requirements have been met in accordance with the request. 8 - The new process has been tested in operation and had to stay under observation up to the validation after batteries of tests in production. 9 - To prevent some collected knowledge part was lost, and depending on complexity, the project type and risk involved, the R&D teams have created a technical report on the component's project. But what is the advantage achieved by this reverse engineering process as sophisticated? The advantage was that with this scientific method used, the problems are replaced by a formal treatment as well as its solution. The first part of the problem identification of system studied was the classification of the patterns of problem presented in Table2 [1]. Table 2: Classification Generic Standards Issues

This action was essential to that records' observations obtained allowing an analysis applied later. The assembly of the DM with data basis distributed with APF estimates from projects was the first step to identify the behavior of

LPCM. In this study the data collected came from more than one basis by estimates of APF project to be processed individually in order to obtain a specialized DM. The Table 2 the team of P&D found that the patterns of problems identified in this analysis (CAR) was the analysis pattern. Based in the distributed projects databases the production teams began manual analysis activities for an APF atmosphere parameters definition. In created DM they were accomplished manual consolidations that originated difficulties in an acquisition maintenance of these registrations. After the process focus bottle neck identification, P&D team solved mapping process. This resolution is presented in the Picture 3.

Table 3: Process for Identifying Patterns of Problems and Solutions

The MIS identification scenario needed to be supported by a technology that had the capability to integrate flocks distributed data. In research in the database area, methods that suited the needs identified by the analysis were met by the ETL method. Complementing the reasoning line presented during the developing ETL component process with an approach to be chosen by the team of R&D. Since the beginning of the process sought the establishment of a working prototype.

5. Reduced scope

Generic Pattern Classification Problem Architectural: The structural organization of systems hardware and software. Project: The organization of activities around tasks for the performance of a product. Process: The organization in an orderly manner in the task within an activity. Programming: Introduces the programming solutions to implement the requirements requested by the client. example - the choice of language. Analysis: The process of verification and validation of documents and records to ascertain whether the specification of a client was treated in accordance with the contract.

Identification Process

Issues Standards Solutions Standards

  1. The R&D team is organized to define a methodology and stabilize the engineering process of software development.

ETL was the methodology defined.

  1. The team develops a model of standards to be used in the project. Each standard must be configured with specific techniques for each step. The techniques need to check and test the technology and standard processes. With these records, there is evidence that the effective resolution is to which proposed.

Were identified a pattern of structured coding ETL component.

  1. If your system standard has not identified any problems, through research, identify the focus of the problem, in database query or other known sources in addition to the documentation of patterns.

The parameterization of the data structures of the two distributed databases to be integrated with the data mart.

  1. When a problem is very specific and however the company has not a solution to the need.

Was developed an ETL standard for the integration of new distributed databases.

The MIS integration model has established specifications for the construction of an SQL Server 2000 ETL component, which contains the MIS of APF and PSP in order to meet the subjects terms studied in the program. Were organized activities, processes, methods, techniques and software engineering tools to integrate the solution according to guidelines development project [11]. The quality of products and processes will be used in the project with the quality software engineering standards along with the CMMi guidelines [1]. One research products was a CAR methodology for ETL components creation.

6. Development ETL component

To compose an instrument new, there are following steps to create the prototype of analysis' instrument, so presented himself process defined that enable the development, though is possible future considerations. The proposed study has the following steps in Table 4: To implement the ETL component was necessary a theoretical and practice study on the technology, this step is part of the learning process. Based on the data structure, the modeling provided an analysis of ETL could be applied to determine the requirements integration specification. One of the requirements were functional ETL verification and validation. Table 5 presents the main testing activities process of User. For the data collection and technical information necessary for the development of the ETL component used the Delphi method [1]. The test plan for validation of the component passed the ETL functionality needed to load the instrument. The test performed was derived standards for software products shown in Table 5. This table shows the main activities of the testing process of User. For the data collection and technical information necessary for the development of ETL component we used the Delphi method [22]. The test plan for validation of the ETL component passed by functionality needed to load the instrument. Table 4. Process of development of ETL. Development process of the ETL component 1 - Formulation of the problem. 2 - Planning research. 3 - Elaboration of ETL component. 4 - Load Testing. 5 - Verification of the data. 6 - Data analysis and interpretation of data and results. 7 - Presentation of data and results. 8 - Verification and Validation.

The Picture 5 presents main activities of user's test process. For data rising, it was necessary technical information for component ETL development used - if method Delphi [22].

The plan of tests for a component ETL validation went by necessary functionalities for instrument load. The accomplished test was derived of norms for software products shown in the Picture 5. This picture presents the main process of user's test activities. The tests plan for the component ETL validation went by necessary functionalities to load this instrument.

Table 5. Criteria testing for product.

The test performed was derived for software products standards shown in Table 5. After making the selection of test's types' criteria's for the ETL component, began the process validation and verification described in Table 6.

Table 6. ETL Validation and Verification Process. Process ETL verification and validation

  1. Find at least three examples in which the load of data was performed effectively using the same solution.
  2. Declare the hypothetical solution as standard candidate.
  3. Execute an activity of the Delphi method to improve the description of the candidate to share itself with others.
  4. Implement the ETL candidate for another project of software development.
  5. Declare the standard candidate if your application is successful With all the planning done, begins to prototype construction step to a production line [9] [1]. The implementation must to be monitored [12], since it becomes the bridge between planning and bring data. If it is not very well thought, the work might be compromised by increasing project risk. [1].

6. 1. Specification model suggested

Test criteria for product

  1. Reliability: Data storage reliability Indicators, data integrity, backup, security, ability to handle failure and risk analysis.
  2. Usability: The usability criteria are defined as follows: consistency User Interface, Internationalization, ease of learning, quality documentation tool, quality training material, diagnostic clarity, ease of installation, acceptable response time and help on - line
  3. Maintainability: responsibility, traceability, compatibility product delivered.
  4. Efficiency: Ability of developing an instrument that their requirements for data storage are fast and dynamic.
  5. Effectiveness: capability to correctly portray the analysis.
  6. Portability: The ability to move data and share them in network environments.

in DM to factory's software from LPCM. Represented by a process defined the PSP. Without the formal process presented, the methodology used for line production becomes uneconomical to maintain the software factory business. The ETL created such standard was used SQL Server 2000 and the modeling visual applied, generated more than 1400 lines of code (LOC) in SQL Server, performed on a component. The ETL methodology presented requirements for the assembly of DM the SP and DM of LPCM. The creation of this system is advisable as to the manual activities use, consistency and your gain is realized in that production use is obtained on a large scale. To continue this research is recommended: 1 - The use of concepts of ETL, 2 - Understanding the methodologies of DM and DW projects, 3 - Knowledge of PCP for CMMi, TSP and PSP for use on production lines software such as manufacturing cells LPCM [1] [9] [12]. For future work, it is suggested that research results are applied to the use of data mining in, for example, in Operational Research (Pesquisa Operacional - PO).

9. Acknowledgments

The authors of this research would like to thank: 1 - When GPES for support the development of methodological research; 2 - ITA by the availability of resources for infrastructure software; 3 - Tata Consultancy Services (TCS) the possibility of partnerships with the productive sectors academics.

10. References

[1] Montini, Denis Ávila, Modelo de indicadores de risco para o orçamento de componentes de software para célula de manufatura. / Denis Ávila Montini. 360p. Dissertação (Mestrado) em Engenharia de Produção – Universidade Paulista - (2005).

[2] Laundon, Kenneth C. & Laudon, Jane P. 7 ª Edição.

  1. Livro: Sistemas de Informação Gerenciais.

[3] Albrecht, Allan J. Function Points as a Measure of Productivity, Act as do 53 rd meeting of GUIDE International Corp. Guidance for users of integrated data processing equipment conference, Dallas, 1981.

[4] Cunha A. M, “CE - 240 Projeto de Sistema de Banco de Dados”, ITA – Instituto de Tecnologia da Aeronáutica. Disponível em: http://www. ita. br/ ~cunha, Acesso em: Dezembro de 2007.

[5] Cunha A. M, “CE - 245 Sistema de Informação”, ITA

  • Instituto de Tecnologia da Aeronáutica. Disponível em: http://www. ita. br/~cunha, Acesso em: Dezembro de

[6] Newcomer, Eric; Lomow, Greg. “ Understanding SOA with Web Services ”. Addison Wesley. ISBN 0 - 321 - 18086 - 0. Edição, 2005.

[7] Kimball, R., The Data Warehouse ETL Toolkit. 1ª Edição, Wiley, 2004.

[8] Humphrey, Watts. Introducing Personal Software Process (PSP) and Team Software Process (TSP) into an Organization August /1997. Disponível no site: <http://www. sei. cmu. edu> Maio /2009.

[9] Montini, Denis Ávila; Moreira, Gabriel De Souza; Vieira, Luiz Alberto; Battaglia, Danilo; Gnatiuc, Carlos Eduardo; Cunha, Adilson Marques Da; Estudo de caso de uma estratégia de integração de middleware para um serviço SOA de gerenciamento e controle de fábrica de software: TCS – Tata Consultancy Services – Intranet website de Base de Conhecimento Corporativa KnowMax: TACTICS Iberoamerica 2007, Brasil, São Paulo, SP 25 - 26/Outubro/2007.

[10] MAIN, Jeremy. Quality wars: the triumphs and defeats of American business. New York: The Free Press, 1994.

[11] CMMi, Version 1. 2 - CMMI - DEV, V1. 2, CMU/SEI - 2006 - TR - 008 - ESC - TR - 006 - 008 - Improving processes for better products. Agosto de 2006.

[12] MARCONDES, F. S., Vega I S, Dias L A V. A Formal Use Case Type Description by Model Transformation: ICSSEA, 2008b.

[13 ] Patterns and Applications 1996.

[14]Davenport, Thomas H., Reengenharia de Processos. Rio de Janeiro: Campus, 1999.

[15] Hammer, Michael M. ; Champy, J. Reengenharia: revolucionando a empresa em função dos clientes, da concorrência e das grandes mudanças da gerência. Tradução de Ivo Korytowski. - Rio de Janeiro : Campus, 1994.