data science script presentation | Schemes and Mind Maps Data Mining

Intro

Today, I would like to present on the topic "Predicting Fraudulent Financial

Transactions" – an important issue in the field of finance and data technology.

Our presentation will outline the report’s key sections, including the dataset

overview, methodology, experiment, results, discussion and conclusions.

Problem Formulation

Financial fraud is becoming increasingly complex, now occurring not only in high-

value transactions but also in frequent, low-value ones, especially during low-

surveillance periods. To investigate this, we analyzed 1.75 million transactions from

simulated users across various terminals, covering the period from January to June

2023. Our goal is to identify key characteristics of fraudulent transactions and

propose an effective prediction method to enhance detection accuracy.

Research questions

In our study, we focus on two key research questions. First, "What characteristics

distinguish fraudulent financial transactions?" Second, "How can financial transaction

fraud be predicted accurately using historical transaction data?"

Based on the data, we see that there are many reasons for fraudulent transactions.

To improve accuracy and make our conclusions more focused, we aim to identify the

main factors that lead to financial fraud. This is also the first research question we

want to explore.

We also want to know if the model can still make accurate predictions when used,

since it is built on past data. This helps us measure how well the model works in

predicting new data. That’s why we chose this as our second research question.

Related work

“Using machine learning Meta-Classifiers to detect financial frauds” research by

Achakzai and Juan (2022) employed machine learning Meta-classifiers to detect

financial fraud. The result is Meta-Classifiers can outperform the best stand-alone

classifiers across different performance metrics and improve predictive performance

over traditional statistical methods.

Methodology

Our study employed two key methods: the Orange app (for analytical tool) and

analytical technique we use is data classification. Firstly, I’ll introduce Orange, an

open-source data mining and visualization platform, allowed us to analyze data, build

models, and visualize results effectively. For data classification technique (the

process of predicting/inferencing the class (or multiple classes) of a given data object

based on a predefined classification model), we tested three models — Logistic

Regression, Decision Tree, and Support Vector Machine (SVM) — to predict

fraudulent transactions. After evaluating their performance, which method was

Partial preview of the text

Download data science script presentation and more Schemes and Mind Maps Data Mining in PDF only on Docsity!

Intro Today, I would like to present on the topic "Predicting Fraudulent Financial Transactions" – an important issue in the field of finance and data technology. Our presentation will outline the report’s key sections, including the dataset overview, methodology, experiment, results, discussion and conclusions. Problem Formulation Financial fraud is becoming increasingly complex, now occurring not only in high- value transactions but also in frequent, low-value ones, especially during low- surveillance periods. To investigate this, we analyzed 1.75 million transactions from simulated users across various terminals, covering the period from January to June

Our goal is to identify key characteristics of fraudulent transactions and propose an effective prediction method to enhance detection accuracy. Research questions In our study, we focus on two key research questions. First, "What characteristics distinguish fraudulent financial transactions?" Second, "How can financial transaction fraud be predicted accurately using historical transaction data?" Based on the data, we see that there are many reasons for fraudulent transactions. To improve accuracy and make our conclusions more focused, we aim to identify the main factors that lead to financial fraud. This is also the first research question we want to explore. We also want to know if the model can still make accurate predictions when used, since it is built on past data. This helps us measure how well the model works in predicting new data. That’s why we chose this as our second research question. Related work “Using machine learning Meta-Classifiers to detect financial frauds” research by Achakzai and Juan (2022) employed machine learning Meta-classifiers to detect financial fraud. The result is Meta-Classifiers can outperform the best stand-alone classifiers across different performance metrics and improve predictive performance over traditional statistical methods. Methodology Our study employed two key methods: the Orange app (for analytical tool) and analytical technique we use is data classification. Firstly, I’ll introduce Orange, an open-source data mining and visualization platform, allowed us to analyze data, build models, and visualize results effectively. For data classification technique (the process of predicting/inferencing the class (or multiple classes) of a given data object based on a predefined classification model), we tested three models — Logistic Regression, Decision Tree, and Support Vector Machine (SVM) — to predict fraudulent transactions. After evaluating their performance, which method was

identified as the most effective model, we will choose this method for improving fraud detection accuracy. Experiment Introduction of dataset We utilized the "Fraudulent Transaction Detection" dataset, sourced from Kaggle— a reputable online platform for research and learning. This dataset was used to address the research questions posed, aiming to predict the likelihood of fraud in financial transactions and derive the final conclusions for our project. Below is the data structure of the "Fraudulent Transaction Detection" dataset, including its attributes, meanings, and roles within this dataset. From the dataset containing 1.05 million instances, we decided to reduce to 7, instances to minimize the impact on data representativeness while dealing with a severe class imbalance (where fraudulent transactions account for a significantly smaller proportion than legitimate transactions). Data preprocessing Since the dataset contained no missing values or noisy data, we retained the original data without adjustments. Using the Rank Widget tool in Orange, we identified five variables — CUSTOMER_ID, TX_TIME_SECONDS, TRANSACTION_ID, TX_TIME_DAYS, and TERMINAL_ID — as having minimal impact on fraud detection. These variables were removed to improve model efficiency. Additionally, we renamed certain variables for improved clarity during analysis by Edit Domain widget. Research Question 1: What characteristics distinguish fraudulent financial transactions? To answer this question, we analyzed the dataset to identify key patterns that set fraudulent transactions apart from legitimate ones. First, we examined the transaction amounts. While many people assume that fraudulent transactions are usually high-value, our findings showed otherwise. Fraudulent transactions appeared across various value ranges, including small and moderate amounts. This suggests that fraudsters may intentionally conduct low- value transactions to avoid detection. Next, we explored fraud scenarios in the dataset. Among the identified fraud cases, Fraud Scenario 1 was the most common. This scenario often involved a series of rapid transactions from the same terminal or customer within a short period. This pattern aligns with the “burst fraud” tactic, where multiple low-value transactions are processed quickly to bypass security systems. These findings highlight that fraudulent transactions are not limited to large amounts but can also occur in smaller, frequent transactions. Identifying these patterns is crucial for developing effective fraud detection models. Data Classification (RQ2)

model to spot rare fraud cases. Another challenge was that the dataset had limited details, which could prevent the model from detecting more complex fraud patterns. *Future Direction: First, some transactions lacked clear fraud labels. Techniques like clustering or semi-supervised learning could help identify patterns in these cases. Second, to address data imbalance, methods such as SMOTE or threshold adjustments could improve the model’s ability to detect rare fraud cases. Lastly, adding features like transaction frequency, merchant types, or location data may help uncover more complex fraud patterns. These improvements can further enhance fraud detection accuracy. Conclusion

data science script presentation, Schemes and Mind Maps of Data Mining

Related documents

Partial preview of the text

Download data science script presentation and more Schemes and Mind Maps Data Mining in PDF only on Docsity!