


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
data science presentation's script
Typology: Schemes and Mind Maps
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Intro Today, I would like to present on the topic "Predicting Fraudulent Financial Transactions" – an important issue in the field of finance and data technology. Our presentation will outline the report’s key sections, including the dataset overview, methodology, experiment, results, discussion and conclusions. Problem Formulation Financial fraud is becoming increasingly complex, now occurring not only in high- value transactions but also in frequent, low-value ones, especially during low- surveillance periods. To investigate this, we analyzed 1.75 million transactions from simulated users across various terminals, covering the period from January to June
identified as the most effective model, we will choose this method for improving fraud detection accuracy. Experiment Introduction of dataset We utilized the "Fraudulent Transaction Detection" dataset, sourced from Kaggle— a reputable online platform for research and learning. This dataset was used to address the research questions posed, aiming to predict the likelihood of fraud in financial transactions and derive the final conclusions for our project. Below is the data structure of the "Fraudulent Transaction Detection" dataset, including its attributes, meanings, and roles within this dataset. From the dataset containing 1.05 million instances, we decided to reduce to 7, instances to minimize the impact on data representativeness while dealing with a severe class imbalance (where fraudulent transactions account for a significantly smaller proportion than legitimate transactions). Data preprocessing Since the dataset contained no missing values or noisy data, we retained the original data without adjustments. Using the Rank Widget tool in Orange, we identified five variables — CUSTOMER_ID, TX_TIME_SECONDS, TRANSACTION_ID, TX_TIME_DAYS, and TERMINAL_ID — as having minimal impact on fraud detection. These variables were removed to improve model efficiency. Additionally, we renamed certain variables for improved clarity during analysis by Edit Domain widget. Research Question 1: What characteristics distinguish fraudulent financial transactions? To answer this question, we analyzed the dataset to identify key patterns that set fraudulent transactions apart from legitimate ones. First, we examined the transaction amounts. While many people assume that fraudulent transactions are usually high-value, our findings showed otherwise. Fraudulent transactions appeared across various value ranges, including small and moderate amounts. This suggests that fraudsters may intentionally conduct low- value transactions to avoid detection. Next, we explored fraud scenarios in the dataset. Among the identified fraud cases, Fraud Scenario 1 was the most common. This scenario often involved a series of rapid transactions from the same terminal or customer within a short period. This pattern aligns with the “burst fraud” tactic, where multiple low-value transactions are processed quickly to bypass security systems. These findings highlight that fraudulent transactions are not limited to large amounts but can also occur in smaller, frequent transactions. Identifying these patterns is crucial for developing effective fraud detection models. Data Classification (RQ2)
model to spot rare fraud cases. Another challenge was that the dataset had limited details, which could prevent the model from detecting more complex fraud patterns. *Future Direction: First, some transactions lacked clear fraud labels. Techniques like clustering or semi-supervised learning could help identify patterns in these cases. Second, to address data imbalance, methods such as SMOTE or threshold adjustments could improve the model’s ability to detect rare fraud cases. Lastly, adding features like transaction frequency, merchant types, or location data may help uncover more complex fraud patterns. These improvements can further enhance fraud detection accuracy. Conclusion