Scarica Comutational tools excel, PowerPoint, Power BI e più Appunti in PDF di Informatica gestionale solo su Docsity!
Lecture 1
Basic Excel
Excel is not a database or data source storage. A Table, in reality is calles rainbow of cells = collection of a column, one close to other. Each column has to
- No empty
- A their name -> or when is needed excel will say error
- A single datatype = numbers, text … ( since is a spreadsheet Excel will allow also to put them differently in the table ) Column = collection of inf DATA = row materials, they do not tell anything -> we have to manipulate this Content= title of columns If the columns have a clear meaning is easy a. Customer number = identifier of customers which is unique b. Product = identifier of what the customer purchase c. Production plant = where the product is manufacture d. Order entry date e. Schedule ship date f. Actual ship date g. Invoice date -> document that says that there is document of the transaction ( fattura) h. Sale AMT i. Order quantity j. UNIT PRICE k. Product series = way of grouping the product Row = represents the transaction of one customer (there may be more transaction of the same customer) in a period. What do we do with the table? Act as a reporting SUM -> if we highlight the cells we are interest in low on the right we are given info a. Total sell b. Count = number of cells not empty ( and if this is equal to the row -> means that there are no empty) c. Average = of sum and count Spent = 37 milion Quantity = 130 million Average unit price = 11 dollar They do not match -> so we have to find which is not. It is more likely to an error on the average. We did not considered the quantity.
The weighted average price which we need -> amount spent /quantity of bough Weakness of this technique -> we select all the rows, there may be pollution of the table To select only the cells of the table -> CTRL + SHIFT + FRECCIA GIU ( dalla prima riga) OPERATIONS
- Start with equal
- Enter a first input which is mandatory
- Use the divider that excel tells you to use) -> punto e virgola sostituisce il segno
- To speed up -> due punt first and last cell involved To exclude only single cells -> when we have all the cells highlight -> press SHIFT and narrow down) Copy -> copy and past function it ->
- copying in the same column and different order -> change of the letter ( right or sx) of the letters skipped
- Same column and row -> calculate skip rows and adapt it 3 series -> pivot table manually ( LOOKS LIKE IT BUT IS NOT A PIVOT TABLE , but is the technique that than is used))
- When we filter the color of row is applied (we did for P1) SUBTOTAL-> is sensitive to the filter , but we have to insert the tipo of function of main we want) How to copy and paste only the number and do not the formula= tasto destroy -> copy values The total is never the sum of the subtotals -> it may be the get result but is not calculate rightly -> we have to go back to the selection of all the series PIVOT TABLE AUTOMATIC strength -> is flexible select a single cell in the table -> excel will extend to the limits of the table question
- Upper part = range = gives us name of the sheet, gives left upper corner and bootm right) = if the corners is right the excel understood the table
- Where we want the pivot table to be put -> better a new sheet Create There is a panel on the right -> we use to organise it a. Elements to select ( of the columns ) a. If we flag them we obtain results -> if it numeric excel is programmed to sum them b. If we want to change the automatic function we go to values selcect the column of which we want to do sth -> tasto sinistro -> value flied … change the function
We know that there are no plant that produce series P Put again the series under the production function -> P1 is removed ( that’s an autyomatioon of excel = to remove the empty row ( all the row has to be empty) What happens if data change? Normally we do not What to see -> the source?
- Click on the pivot table
- Analyse
- Change data source When we generate the pivot table there is a copy of the table we started from ( and in fact the dimension is increasing) REFRESH If the operation that enables us to see the influence of the change of datas Pareto rule = if we have a collection of things typically the 20 % of customers consumes the 80% I want to keep the report small -> more columns I add becomes to big -> up to reach the dataset table
Filter section -> when I want to go in details vut I do not want to increase the size, we use the filter
We put the main filter we want in this section -> and select. GRAND TOTAL -> NO FILTYER OF COLUMS AND ROWS BUT OF THE FILTER AREA yes
Tyhe pivot table has the drill throught operations that generalts an additional sheet
showing the rows of the sales table with the data selected on the pivot
- Double click on the table
Doing a graph out of pivot
CLICK RIGHT ON THE NAME OF THE SHEEP WHERE THE TABLE IS
MOVE/XOPY
CREATE COPY
Then analyse -< pivot charge
Function in value of count
COUNT = count od cells in row that are not empty Usually we do not put numbers in the listing -< but ggregate them in the values
UNIT PRICE -> weighted
How to do the weighted average? we need 2 columns at least to do it a. When we do it outside we have to adjust all the time b. In the table A calcolate field is … to generate it we have to be in the pivot ( aany cells in the calculation is ok, no where is a list) Then click in insert In the home -> calculate field B. name it c. change the formula -> select from the list of field : sales AMT / Order Quantity we now in the name wrote CALC bc in this way we easily identify which is a calc or a column in the original table --- is a calculated field that contains a formula no VALES !!! how the numbers are calculated? ( no from other cells of the pivot)
- Filter applied
- Locally calculates the sum of dales AMT and sum of order quantity Why the formula we wrote in the calculated field will do not work in the other system or doing by hand What we insert are not numbers, but the title of the column In excels works bc the system takes the most common operation that is done by the people with that numbers
Know we have update the table, and we have new columns In the Pivot table -> we have to do a refresh so that updates the data to which refers. From now we have the new columns to be possibly selected for the pivot table Do no have ti give for grant something -> we have to specify explicitly what we want. Are we sure that what I am searching for is there just one in that column
- For the customer number yes
- For the product table is not -> we have to check so that the column that we use as key is a primary key in the look up table ( so in the product table) o Primary key: is column where there are no doubles or empties What happens if is not primary? There may be doublings and excel doesn’t give us error but takes the first that finds So primary key is necessary in the look up table, no in the destination table ( sale) How to see if is key or not? If is not unique why? Create a pivot table for the look up table, select the column we want to check if is unique and not
- If is not uniqe the number of row in the pivot table will not be equal to the product rows We need to know why -> we put product both in rows and values -> we have the count for each product -
sort from largest to smallest We see that is not unique Double click on the number we want to have details of -> in this case is 3 ( the highest) we see that they are done in different plants, which have different costs. So be sure if we identify a plant and product combining we obtain a unique combination -> concatenating in a new column doing = column 1 & column 2 We need the primary key in the sales -> product and plants Then the usual look up TOTAL REVENUE – TOTAL COST = margin CM =Contribution margin: is the contribution that each transaction gives to the payment of the fixed cost When the fixed cost is larger than the margin cost there is loss To judge the margin we need the percentage to the sales – contribution margin amount over sales. We see the effect of the sales For the same margin I prefer selling less, since means the less effort But this column cannot be aggregate because is the margin of that line over the sales of that line We want to create a calculated field – we will not use it.
LECTURE 5
We will now refer to measures : it has a more structure -> and will write measures all the time and
we will less create calculated columns.
DIFFERENCE:
- before we had the data in the within the excel file and we generated the pivot table in
it -> EXCEL was both datasource and data analysis
- Now: we divide the data source ( better if in a database) and analysis ( in EXCEL) The data source = BASIC DATA SET file Analysis : a new blank file
HOW to do a business’s intelligence analysis:
We have the data ( in one or more databases ) -> we want to analyse them
- ETL : extract, transform and load EXTRACT: We want to collect the data sources, extract form each data sources one or more tables TRANFORM: is a clean perfect table given then in the next step LOAD : send the transformed data in the next step
- DATA MODEL : we take the clean tables and we connect them into the data model, to avoid the vertical look up. Then we design and create a. Calculated columns ( columns which are added to the original columns in the file) b. Measures ( evolution of the calculated fields) c. Calculated tables ( table not belonging to the datasources, but connected) d. Relationship
- VISUALIZATION
- SHARE ( share the report) All this is the process of business intelligence. In an EXCEL ENVIROMENT we can do all the steps 1 2 and 3 EXCEL ENVIROMENT Power Queries = ETL Power Pivot = Data Model Power Pivot Table = VISUALISATION -> our pivot table will take information from many tables at one time ( data model)
In PRACTICE:
Power queries: to go here we have to go on DATA -get data – from file – file workbook ( extracting). We select all the tables. We load to ( we decide where to load) ONLY CREATE CONNECTION and ADD this data to the DATA MODEL. In this way we have already sent the tables in the DATA MODEL This first database is good for cleaning and extracting
There is the creation of a relationship one to many Returning to the pivot table we see the right result CREATING SLICER -> helps to filter and see only what we want ( is the same as putting the filter but more practical) to do it double click on the row you want to filter for) Collecting calendar to the sales -> same process : we have to decide what to which date in the sales we are interested. Then the calendar will “adapt” and show and connect all the tables to the sales. Collecting THE PRODUCT to the sales -> power pivot tell you I cannot do it since there is no a primary key on at least one side. So we have to create a concatenated column - calculated column - in both product and sale -> PRODUCT TABLE 1- go to the data view 2- insert a new column – primary key. 3- Insert the formula- the We have an ITERATION PROCESS – doing it automatically for all the cells = 'Product'[Product] &'Product'[Production Plant] IT MEANS table product, field product SALE TABLE 1- go to the data view 2- insert a new column – FOREIGN KEY PRODUCT 3- Insert the formula- the We have an ITERATION PROCESS – doing it automatically for all the cells =Sales[Product]&Sales[Production Plant]
RULES TO FOLLOW from now
1. In values only measures
2. We only put columns from the dimension in filters, columns, rows and visual filters
a. the filter has to be applied from the dimension – and these will propagate the
contrary no
basic measures and meta measures
go to power pivot – measures – new measure ->
we create total sale (with the sum) -> SUM(Sales[Sales AMT])/
we create total quantity ( with the sum) - SUM(Sales[Order Quantity]
create the average selling price: SUM(Sales[Sales AMT])/SUM(Sales[Order Quantity]) -> but we
have the already existing definition instead of the code -> so we should write = =[total Sales]/[total
quantity]
DIVIDE -> is better to substitute the “/” with divide -> DIVIDE( numerator;denominator)
we create a meta measure: so a measure that uses other existing measures
if we modify a measure will automatically use the new values.
DASHBOARD EXAMPLE
Final exercise
I have a shop, total sales are my revenue. I have to decide to stay close on Friday, Saturday, Sunday. I want to know how much invoice daily Average revenue per day We have to find a PROXI ( means even if is not precise it’s ok) You did it on your own – try do re do it alone on the file you find the solutio
LESSON 6
Filter vs cross filter
The filter America in sale area is cross filtering the customer number on the customer table -> so
this is also propagated to the sales.
a. customer table is filtered -> sales number is cross filters
b. sale area is filtered -> cust. no in customer is cross filtered and the filter propagates to the
cust. no in the sale
The column I put in the ROW is directly filtered -> and this has a propagation on the transactions
( through the process above described)
1. cross filter on the primary key of that table
2. propagation and cross filter on the destination table on that columns
can you tell me which is the count row of sale? we cannot answer -> we need to have the FILTER
( or said with no filter )
in Dax when there is a table we have to think that is subject to table, so is not THE
COMPLETE TABLE
FILTER CONTEXT
So we have the same measure but with different result, due filters
- filter empty -> means I see everything/no restrictions GRAND TOTAL in this case
EXERCISE
Out of each transaction of each continent I want to see in percentage the ones has been done in
JANUARY
a. keep the sales area but I want to apply a restriction to the JAN
FUNCTION FILTER – is a table function ; the filter will go row by row, and if the condition is met the
row will be present in the output.
THEN to the new output there is the function COUNTROWS
=COUNTROWS( FILTER( Sales; RELATED('Calendar'[YearMonth] = 201001
2. DATA MODELING -> are on the left and there are the 2 view ( data views and the model
view) – some collection has been already done by an AI algorithm
Collapsing the model shown only the connected through the relationships
3. VISUALISATION -> Matrix ( correspond to a pivot table)
the ROW, COLUMNS AND VALUES are the field of the pivot
table.
It adapts to the visualisation we are doing
Creation of a measure in power BI, which will be loaded in the
values -> to add a measure from the data on the right ci sono I 3
pallini vicino al name of the table – click add a measure.
Name of the measure = DAX formula we are interested in
b. Line visualization
We added a new measure -> qty = SUM( sales ( total quantity)) …
In the same sheet, we can create more visualisation and these INTERACT ONE WITH THE
OTHER
c. New measure -> Num of Active Cust = DISTINCTCOUNT(Sales[CustomerKey])
We add it as a card in the visualisation
d. Map -> we have
a. Location :
Create a duplicate -> keep only the matrix
New measures
a. Num of transactions -> countrows ( sale)
b. Restrict the num of transaction to female -> which are distincted in the customer table
Num of transaction F = COUNTROWS(FILTER(Sales,RELATED(Customer[Gender])="F"))
Related since is referring to a limitation od another table
The SUM function is not a Dax function -> it accept only a columns. Not access the table, to do
something more powerful is SUMX
The complex formula is SUMX ( table, column name) -> the X states that the imput has to go row
by row
When do we need it?
- When we have to sum expression
Example: we want the revenue, without wanting creating the column total sales
amount -> is created in the a temporary memory not physically
SUMX ( Sales, Sales (Unit Price) * Sales (Order Qauntity) )
- Elaborate on the table to iterate to do calculations.
Elaborate on the table to iterate to do calculations -> with the REVENUES
Revenue F = SUMX( FILTER( Sales, RELATED(Customer[Gender])= "F" ), Sales[UnitPrice]*Sales[OrderQuantity] )
ALL REVENUES -> does not consider the filter color ->
all revenues = SUMX( ALL(Sales), Sales[UnitPrice]*Sales[OrderQuantity] )
COUNTROWS ( table) = SUMX ( table,1)
OPERATORS:
FUNCTION AND
It checks several conditions
- Is the first logical condition > 18
- Is the second logical condition also > = 18
The and will write true if BOTH THE logical condition are TRUE
Function OR
It returns FALSE when all the logical condition are FALSE
The number of logical input may be infintite in EXCEL
The number of logical input can only be 2 in DAX
SO what do we use in the DAX?
We use the operators
Create REVENUES F in 2001
- We still can use the function AND -> since we have 2 conditions Revenue F 2001 = SUMX( FILTER( Sales, AND( RELATED(Customer[Gender])= "F", RELATED('Calendar'[CalendarYear])= 2001 ) ), Sales[UnitPrice]*Sales[OrderQuantity] )