










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
# Dataflow Notes – Complete Study Material and Lecture Guide ## Subject Information * **Subject:** Dataflow / Data Flow Computing * **Course:** Computer Science / Information Technology * **Level:** Diploma / Undergraduate * **Academic Year:** 2025/2026 * **Category:** Lecture Notes / Revision Material / Exam Preparation * **Format:** PDF / DOCX / Typed Notes * **Language:** English --- # Description These notes provide a detailed introduction to Dataflow concepts used in computer science, software engineering, distributed systems, and computer architecture. The document explains theoretical concepts, diagrams, examples, and practical applications of dataflow systems in modern computing environments. The notes are organized in a clear and easy-to-understand format suitable for: * Students preparing for exams * Assignment research * Quick revision * Class reference * Self-study
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











At the end of this Topic you should be able to:
Data-flow diagrams (DFDs) model a perspective of the system that is most readily understood by users – the flow of information through the system and the activities that process this information. Data-flow diagrams provide a graphical representation of the system that aims to be accessible to computer specialist and non-specialist users alike. The models enable software engineers, customers and users to work together effectively during the analysis and specification of requirements. Although this means that our customers are required to understand the modeling techniques and constructs, in data-flow modeling only a limited set of constructs are used, and the rules applied are designed to be simple and easy to follow. These same rules and constructs apply to all data-flow diagrams (i.e., for each of the different software process activities in which DFDs can be used).
An example of part of a data-flow diagram is given below. Do not worry about which parts of what system this diagram is describing – look at the diagram to get a feel for the symbols and notation of a data-flow diagram.
As can be seen, the DFD notation consists of only four main symbols:
Four basic elements are used to construct data-flow diagrams:
A number of different notations exist for depicting these elements, although it is only the shape of the symbols which vary in each case, not the underlying logic. This unit uses the Select SSADM notation in the description and construction of data-flow diagrams. As data-flow diagrams are not a part of the UML specification, ArgoUML and Umbrello do not support their creation. However, Dia is free software available for both Windows and Ubuntu which does support data-flow diagrams.
Processes are the essential activities, carried out within the system boundary, that use information. A process is represented in the model only where the information which provides the input into the activity is manipulated or transformed in some way, so that the data-flowing out of the process is changed compared to that which flowed in. The activity may involve capturing information about something that the organisation is interested in, such as a customer or a customer's maintenance call. It may be concerned with recording changes to this information, a change in a customer's address for example. It may require calculations to be carried out, such as the quantity left in stock following the allocation of stock items to a customer's job; or it may involve validating information, such as checking that faulty equipment is covered by a maintenance contract.
Processes are depicted with a box, divided into three parts.
The top left-hand box contains the process number. This is simply for identification and reference purposes, and does not in any way imply priority and sequence. The main part of the box is used to describe the process itself, giving the processing performed on the data it receives. The smaller rectangular box at the bottom of the process is used in the Current Physical Data-Flow Diagram to indicate the location where the processing takes place. This may be the physical location
— the Customer Services Department or the Stock Room , for example. However, it is more often used to denote the staff role responsible for performing the process. For example, Customer Services , Purchasing , Sales Support , and so on. Rules The rules for processes are:
The directed line is labelled with the data-flow name, which briefly describes the information contained in the flow. This could be a Maintenance Contract , Service Call Details , Purchase Order , and so on. Data-flows between external entities are depicted by dashed, rather than unbroken, lines. Rules The rules for drawing data-flows are:
Names consist of a singular noun describing the role of the entity. Above the label, a lower case letter is used as the identifier for reference purposes. Rules The rules associated with external entities are:
Where more than one copy of an external entity appears on a diagram it has a cut off corner in the top left, such as below:
When more than one copy of a data store appears on a diagram it has a cut off left-side, such as below:
The context diagram is used to establish the context and boundaries of the system to be modelled: which things are inside and outside of the system being modelled, and what is the relationship of the system with these external entities. A context diagram, sometimes called a level 0 data-flow diagram , is drawn in order to define and clarify the boundaries of the software system. It identifies the flows of information between the system and external entities. The entire software system is shown as a single process. A possible context diagram for the Video-Rental LTD case study is shown below.
The process of establishing the analysis framework by drawing and reviewing the context diagram inevitably involves some initial discussions with users regarding problems with the existing system and the specific requirements for the new system. These are formally documented along with any specific system requirements identified in previous studies. Having agreed on the framework, the detailed investigation of the current system must be planned. This involves identifying how each of the areas included within the scope will be investigated. This could be by interviewing users, providing questionnaires to users or clients, studying existing system documentation and procedures, observation and so on. Key users are identified and their specific roles in the investigation are agreed upon.
In order to produce the context diagram and agree on system scope, the following must be identified:
We can see that on this level 1 DFD there are a number of data stores, and data-flows between processes and the data stores. It is important to notice that the same data-flows to and from the external entities appear on this level 1 diagram and the level 0 context diagram. Each time a process is expanded to a lower level, the lower level diagram must show all the same data-flows into, and out of the higher level process it expands.
If no context diagram exists, first create one before attempting to construct the level 1 DFD (or construct the context diagram and level 1 DFD simultaneously). The following steps are suggested to aid the construction of Level 1 DFD:
We have already seen how a level 0 context diagram can be decomposed (exploded) into a level 1 DFD. In DFD modeling terms we talk of the context diagram as the “parent” and the level 1 diagram as the “child”. This same process can be applied to each process appearing within a level 1 DFD. A DFD that represents a decomposed level 1 DFD process is called a level 2 DFD. There can be a level 2 DFD for each process that appears in the level 1 DFD. A possible level 2 DFD for process “2: Loan of video” of the level 1 DFD is as follows:
Note, that every data-flow into and out of the parent process must appear as part of the child DFD. The numbering of processes in the child DFD is derived from the number of the parent process – so all processes in the child DFD of process 2, will be called 2.X (where X is the arbitrary number of the process on the level 2 DFD). Also there are no new data-flows into or out of this diagram – this kind of data-flow validation is called balancing. Look at the rectangular boundary for this level 2 DFD. Outside the boundary is the external entity “Customer”. Also outside the boundary are the two data stores – although these data stores are inside the system (see the level 1 DFD), they are outside the scope of this level 2 DFD. Constructing level 2 (and lower) DFDs — functional decomposition The level 1 data-flow diagram provides an overview of the system. As the software engineers' understanding of the system increases it becomes necessary to expand most of the level 1 processes to a second or even third level in order to depict the detail within it. Decomposition is applied to each process on the level 1 diagram for which there is enough detail hidden within the process. Each process on the level 2 diagrams also needs to be checked for possible decomposition, and so on. A process box that cannot be decomposed further is marked with an asterisk in the bottom right hand corner. A brief narrative description of each bottom-level process should be provided with the dataflow diagrams to complete the documentation of the data-flow model. These make up part of the process definitions which should be supplied with the DFD. Each process on the level 1 diagram is investigated in more detail, to give a greater understanding of the activities and data-flows. Normally processes are decomposed where:
Always bear in mind that levelling is meant to simplify and clarify the diagrams, and if this cannot be done then it may be best to leave the diagram as it is.
The key to successfully levelling is to make the diagrams balance. For example, if a second level diagram expands a first level process then all the inputs to the process must be inputs to the second level diagram, and all the outputs from the process must be outputs on the second level diagram. Moreover, there must be no other inputs and outputs. To be particular, all the inputs and outputs of the system which appear on the context diagram must appear on the level one diagram and there should be no other inputs and outputs on the level one diagram. This does not mean there can be no changes to the higher levels of a set of diagrams. When producing a lower level diagram, the software engineer may realise that a new input is needed for the process to be able to carry out its task. In which case, the software engineer should add this data-flow as an input and then add the input as a data-flow to the original process. If needs be, this input may be added at several levels higher up. The software engineer may add new outputs in the same way. As long the diagrams always balance, inputs and outputs can be added and removed wherever necessary.
Numbering in a levelled set of diagrams is important, as the numbers help you to find your way around the levels. It is easily described by example. Suppose Receive Order is the process numbered 3 on the level one diagram (remember, numbers do not indicate any order, they are simply labels) and this is expanded to a level two diagram. The process numbers on the level two diagram will be 3.1, 3.2, 3. and so on. Suppose now that process 3.4 on the level two diagram is Register New Customer and needs further expansion to a level three diagram. The process numbers on this diagram will be 3.4.1, 3.4.2, 3.4.3 and so on. The rule used here is this: if X is the number of the process you wish to expand, then the numbers on the next level are X.1, X.2, X.3... The same applies for data stores. Data stores that appear in a level two diagram expanding a process labelled 4 in the level one diagram will be numbered D4.1. D4.2, D4.3 and so on. Deeper levels will be D4.1.1, D4.1.2, the numbering scheme being just the same as for processes. Note though, it is not the data stores that are expanded. They may simply appear in the expansion of a process.
A software engineer may define a process where no further expansion is appropriate because there are no separate sub-processes which may make up the original process. However, the software engineer may still wish to describe the process in more detail as it is a particularly difficult or tricky process. In this case, the software engineer writes down a process description for the process. This can take any form which the software engineer thinks appropriate. Traditional flowcharts could be used or plain English. More common is what is called structured English. This looks like English only it is written more like a computer language. It used to avoid the problem that different people reading the same piece of plain English may understand it in different ways.
It should be clear that producing data-flow diagrams can be complicated. A routine check using the following questions should make sure that you find any simple mistakes. The first set of questions refer to a single diagram, so if you have a set of levelled data-flow diagrams then these checks need to be made for each diagram.
Just as for logical data structures, to make a data-flow diagram we must analyse the requirements and describe the system in terms of the components of data-flow diagrams. Several attempts will be needed before a final and complete model of the system can be produced. Unfortunately, there is no straightforward way to progress through developing a data-flow diagram. We are aiming therefore to build a skeleton model on paper which we can then work with and develop more fully. Each stage will be illustrated with examples from the following Estate Agent case study (repeated from ???).
and so should have a different name from the input data. Property Details are taken and recorded as a property on the file so the output could be just something like Recorded Property or more simply, Property. Now start again only using this output as a new input. It must either go to another process, to a data store or to an external entity. It should be clear from the case study what happens. If a new process is needed then do the same again. Find a sensible name for the process using the case study, determine and label the outputs and then follow the outputs. If the data is stored then add a data store to the diagram, name it sensibly from the case study and draw the output arrow going into the data store. This is what happens in Property and so we add the data store Properties. If the output is an output from the system then simply add the external entity which receives the output. When the data is finally output or comes to rest in a data store, go back and follow any of the other outputs which may have been defined on the way. When they are exhausted, choose a new input and follow that through in exactly the same way. Follow events Another way to approach building up a data-flow model is to consider what happens in the system. The case study will outline a number of events. There must be processes in the system which respond to these events or even make them happen. Identify these processes and then add the data inputs which are used by the process and determine the outputs. For example, in the estate agent example, there is the phrase, “When a sale is completed...”. This is an event: a sale is completed. From the case study we see that lots of things then happen: the buyer confirms exchange of contracts so this is an input to some process; the client details are removed from the file and invoice is sent out. This is the process. A sensible name might be Record Sale or possibly Receive Sale Confirmation. The data needed is the input from the Buyer and Client details which are on file. This must mean there is a data store somewhere on the diagram holding this information. If there is not one there already then add it. And the output must be an invoice to the Client. From here on the approach is the same as following inputs. For any new outputs, work out where those outputs must go and if it is to a process follow them as if they were inputs to the new process. Most processes can be found in the case study using either technique of following inputs or following events. However, some processes are related to temporal events and so can only be found by following events. As the name suggests, temporal events are events which occur at specific times. They are not prompted to happen by the arrival of new data, but rather because a certain time has been reached. These events often appear in case studies beginning with phrases such as, “Once a month...” or, “At the end of every day,...”. However, once these have been identified, producing the model by following this event is exactly the same as for any other event. In the estate agent system, there are two temporal events: there is a weekly matching of potential buyers with properties; invoices and reminders are sent out on a monthly basis. Though time is the trigger the processes carrying out temporal events, time is generally not shown on the data-flow diagram. This is because the time aspect is often just a practical implementation rather than rigid necessity. For example, the matching of buyers and properties at the estate agents need not be weekly. It is probably done weekly so that it always gets done, and also so that it does not interrupt the other daily business. With an automated system it may be possible to match buyers with properties as soon as any new details on either arrive. Where time is crucial to a process, say accounting done at the end of a financial year, then this can be reflected in the name of the process. For example, “Calculate end of year profits”.
Fill in gaps After building a model that handles each input or each event, it is worth going over the processes defined so far. For each process, ask the question, “Does this process have all the information it needs to perform its task?” For instance, if a process sends out invoices, does it have all the details of the invoice and the address of where the invoice should go? If the answer is No , then add a data-flow into the process which consists of the data needed by the process. If there are several, clearly distinct items of data needed, then you may need an arrow for each item. Now try to identify the source of the data. First, see if the data can be found already inside the system, either on a data store or as a result of a process. If not, it may be that the data can be obtained by processing some of the existing data in, which case add a new process that takes the existing data and makes the data you require. Or, the data may be available but from the case study it is clear that there is a time-lag between the process that produces the data and the process which uses it. Simply add a data store where the data can reside till it is needed. If there is still no source for the data then it could be from an external entity. In which case, this is a new input to the system. It may not be explicitly mentioned in the case study, but if it is necessary then it should be added. Having added the new input from the appropriate entity, go back and correct the context diagram. This is an important task. If there is not enough data to support a task then the system will not function properly. Of course, to be on the safe side, you could have all the data going to all the processes! But this is not really a solution because with a large system this would be impractical. Having examined all the processes, check that all the outputs have been generated. All of the inputs should have been covered already, but this does not mean that all the outputs have been produced. If there is still an output which does not appear on the diagram, see if there is a process where it could come from. If there is no sensible candidate, add a process and begin to work backwards. What inputs does the new process need? Where do these inputs come from? This task is almost the same as the one just described. Any left over outputs must have come from a process. Outputs cannot come from data stores or external entities. If there is no sensible way to fit the output into the diagram then it may be that it is not a sensible output for the system you are currently considering. Use the case study to confirm this. Finally, check the data stores. Data must enter a data store somehow and generally data on a data store is read. For each data store, identify when the store is either written or read by considering the processes which may use the data. Also, use the case study to see that you have not missed any arrows to or from a data store. Repeat By this stage, you will have considered all the inputs, all the outputs and produced a first draft of the data-flow model of the system. Review the case study, looking for functionality described which is not performed by the model. In particular, look for temporal events as these are sometimes hidden implicitly in the text. Where necessary, add new processes that perform the omitted functions and use the method of following events to work out their inputs and outputs. Fill in the gaps of the model in exactly the same way as was done to produce the first attempt.