Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Automated Email Sorting and Response System, Apuntes de Inteligencia Artificial

The development of an automated email sorting and response system for organizations. The system aims to improve email management by automatically categorizing emails based on their content and department, and generating appropriate responses. The system's architecture, data requirements, feature extraction, and machine learning setup. Key challenges include handling large email volumes, email variations, and spam/junk emails. The system leverages historical email data, natural language processing, and machine learning to enhance email sorting accuracy and response times. By automating these tasks, the system can increase productivity and streamline communication within the organization.

Tipo: Apuntes

2023/2024

Subido el 27/08/2024

ana-isabel-mogrovejo-palomeque
ana-isabel-mogrovejo-palomeque 🇪🇸

2 documentos

1 / 15

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
EMAILS 1
EMAIL AUTOMATION SYSTEM
By Karla Almeida, Ana Isabel Mogrovejo Palomeque, and Jennifer Garcia
Final Evaluation
Machine Learning & AI
Alex Ramoneda
EU Business
Barcelona, Spain
22nd of March 2024
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Vista previa parcial del texto

¡Descarga Automated Email Sorting and Response System y más Apuntes en PDF de Inteligencia Artificial solo en Docsity!

EMAIL AUTOMATION SYSTEM

By Karla Almeida, Ana Isabel Mogrovejo Palomeque, and Jennifer Garcia Final Evaluation Machine Learning & AI Alex Ramoneda EU Business Barcelona, Spain 22nd of March 2024

Table of Context CHAPTER 1: THE CASE...................................................................................................................................... CHAPTER 2: ARCHITECTURE AND OPERATION OF THE SOLUTION PROPOSED....................................... CHAPTER 3: DATA.............................................................................................................................................. CHAPTER 4: MACHINE LEARNING SETUP...................................................................................................... REFERENCE LIST................................................................................................................................................

Novelty Although our idea is not a new concept, there is a lot of room to improve the accuracy of these types of systems. The biggest users of these types of machine learning systems in these types of applications include Gmail, Outlook, and Apple Mail. (Totle, 2023) However, our group is aiming to create a simpler system that can be customized depending on the company's needs. Since our system would be using data from the company's historical email history. The machine will be learning keywords and phrases that are specific to the company and its employee's way of speech. The Environment and the Stakeholders Furthermore, it is necessary to understand who the main stakeholders involved in this project will be. By comprehending who will be the affected stakeholders and who will be involved in the implication, we can modify our approach. As stated earlier, this type of system would be most effective for an organization with a functional organizational structure. In this type of structure, the company is separated by departments within the company. These departments will be vital for our system to distinguish to sort emails accurately. The main stakeholders in this project will include the employees in each of the departments, its customers, and the IT department which would need to provide the necessary documentation to teach the system. The companies will also need to keep the budget, bandwidth, and training necessary for this type of implication in mind. Similar processes are offering these services at a relatively cheap price and the necessary bandwidth for this process should be relatively low. Comparable software subscriptions can range from free to 60 USD per month for a personal account depending on the user's needs. (Glosson, 2024). Most email applications should have some type of email sorting software embedded into their system, yet there are some restrictions and challenges that each program chooses to focus on. As an example, SaneBox is a comparable program that labels and sends emails into different folders in the inbox. However, it is stated that the program restricts its users from forming their own rules and is missing some advanced email management tools. (Glosson, 2024). Based on this situation, it can be implied that the cost and bandwidth for an automatic email sorter would be low. Once the installation is completed, the employee should be able to save time in their day.

Challenges Notwithstanding our group, we discussed our experiences with this type of system and we found that some potential challenges may arise while trying to implement our plan. The common challenges that this type of system has to deal with are the massive amounts of emails the system has to sort through daily, the variation within these emails, and the presence of spam/junk emails. (Totle, 2023) Totle further explained that the system needs to have the capability to understand a wide range of email templates. It is recommended that the system take in as much variety into its system as it can, so it can differentiate the email templates. This would enable the system to make predictions based on numerous factors such as the context, sender, receiver, and the template itself. While our system is not specifically designed to detect spam or junk email, with continued training the system should be able to detect certain keywords that are more likely to be classified as trash/spam by its users. Ethnic concerns Many organizations fear spam or scam emails, which puts the company's security at risk. Since our system is not filtering out this type of information, employees must keep a look out for spam emails. For example, the sales department might receive the most emails since scam emails often offer unrealistic opportunities for the sales department. These concerns are often mitigated by the IT department's security training. So our system will not be handling this ethical dilemma.

encryption programs to protect the data, analytics tools to get insights from the reports and any integration tools if needed. We suppose that those softwares will be already installed in the company. Services will include IT support for any department or staff of the organization that requires assistance, training services to familiarize employees with the use of the new system and, if necessary, a consulting service from experts to help the team design and train the system. Storage policy of the data The data will be used, stored and secured according to the policies of the company. A storage and privacy policy should be created or modified, and implemented to reassure the correct use of the data for the system. The main elements to be taking into account in this policy are the data classification, the periods of retention of the data, the cloud where it will be stored, the controls of access, how the data will be deleted when it gets to the time-limit of storage, how the policy compliance with the respective regulations, how the data and the policy will be monitored and reported and finally the training that will be giving to the users of the system regarding best practices on data security, protection and compliance. (KRSTIC, 2021) (Various, 2023) The main storage of this data will be external, on Microsoft Cloud but also on the internal servers of the company, both of them with the corresponding security to comply with the policy mentioned above. Only approved members of the company can be users of this system, the same ones that will be able to access this type of information, mainly referring to the colleagues that can access the shared inbox. Training and Inferences The training will be produced in the Microsoft ecosystem, using Power Automate, a “cloud- based service that allows users to automate workflows across multiple applications to streamline repetitive tasks and integrate various systems and services without the need for extensive coding or development expertise.” (Microsoft, 2024). The flexibility of this tool will also help to produce the inferences of the trained model to analyze if it is working properly. To make the tests with real life examples, staff from the different departments will be needed in order to try out the system and report any errors. End-user perspective

The end-user of this system will be the staff of every department, specially the operational staff. From their perspective, the devices where they will use the system will be their desktop computers or laptops, and if the company allows it, also on their smartphones The operations that the system will carry out include categorizing the emails according to the department it is intended to, send an automatic response when a template triggers this action, keep a record of the emails responded to obtain a report at the end of the month. More actions can be added if they are needed thanks to the flexibility of the tool used. With the system implemented, the end- user will save time and be more productive in other tasks. Expert domain For the system, the contribution of experts will be needed in different stages. The first one will be the data preprocessing where they prepare the input for the model. Then on the training since they will have to train the machine learning model using historical data and optimize the model during this step. After, in the model inference, testing and validation, where they make sure the model is working properly on real scenarios, detecting biases and issues. Finally, their role will be monitoring the performance and any anomalies it may have. Experts will also ensure that the policies regarding security and data privacy are being satisfied during the whole process (Martineau, 2023) (AI-Jobs, 2023). Feature Selection Heuristics and Sampling Heuristics can be used to guide the learning of the system. For Feature Selection, the experts can add relevant features as established templates or words found on emails that belong to a specific department (Qi, Li and Li, 2017). Related to Sampling Heuristics, they can help if the input data is too large and computationally expensive to the company, so data subsets are created for the training. (Pang, 1992) System Performance To evaluate the performance of the system, different metrics will be used along with a confusion matrix. Among the metrics there will be: precision measure to get the proportion of emails classified correctly, above 95% will be the goal; the accuracy measure to get overall correctness of the classifier, again it should be above 95% (Iqbal and Shehrayar Khan, 2022); the email production time that will track the time from when the email is received to when it is answered, it should be less than 1 business day for automatic responses and 2 business days for the ones that require a human response (Knapp, 2021); the deliverability rate that reflects if the emails were delivered in the inbox or it went to spam and should be above the 98% (Knapp, 2021); and finally the email forwarding rate that indicates how many times the email was

CHAPTER 3: DATA

Format of the Raw Data This type of system would require raw data in the format of text and time/dates, from the company's emails. To teach the program how to identify keywords from each department and to find a correlation between different emails and their intended department. Data Representation in the System The system should be able to analyze different data representations such as the email metadata, contents of the email, and the email’s labelization. The metadata of the email included the recipient, sender, subjects, time/data, and the attachments. Furthermore, the system can also analyze the text itself within the system, text lengths, email formatting, and text sentiment. One of the most important aspects to help the system categorize its emails is the labels of the email. If the company email has been correctly labeled to the correct the email can gather all these aspects of the email and find the correlation they have among each in a specific department. Preprocessing of data & Data Cleansing. The preprocessing and data-cleansing of the data include the cleansing of data noise and the censorship of sensitive information that could endanger personal information. This can be accomplished by eliminating irrelevant emails, removing stop words from their dataset, and utilizing tokenization. So, to begin the preprocessing we must eliminate any irrelevant emails within our dataset (Emelianov, 2024). The article explains that irrelevant emails can include emails such as spam and junk mail. By cleansing this type of information from the dataset, we can ensure that our dataset is not taking in information that can lead to inaccurate conclusions. This process will remove some of the data noise within our dataset. Moreover, by creating a list of stop words such as “is”, “the” and “of”, the system will be cleansed of data that does not provide useful insights into what category the datasets need to be at. This will remove more noise in the dataset and offer our system more reliable data. Finally, our team will need to integrate tokenizations into our database. Which has been stated to be “the process of creating a digital representation of a real thing. Tokenization can be used to protect sensitive data or to efficiently process large amounts of data” (McKinsey & Company, 2024). This preprocessing

step can help our clients feel more secure with their sensitive information and ensure that our system is not learning to recognize this information. Properties of the Dataset and Quality Measures To ensure that the data we are receiving is reliable information, we need to be certain of the email's quality, utilize lemmatization, and perform quality checks on the accuracy of our system's categorization skills. These steps will balance out the dataset to be more representative of each of the company's departments. It can be stated that the most important step to ensure the dataset is reliable, is to check if the emails have been correctly labeled. This ensures that the system learns the correct information for each of the labels in the emails. Additionally, lemmatization is another recommended preprocess, which is a “natural language processing (NLP) model to break a word down to its root meaning to identify similarities” (Srinidhi, 2023). This system has been stated to increase the accuracy of the data since this preprocessing step provides the systems that understand the overall concepts of the emails. Since the system can improve its overall understanding of the emails, the system can make a more precious decision on the email's intended department. Furthermore, it is recommended that we perform supervised machine learning, “to evaluate the performance of a supervised learning model for email classification, metrics such as accuracy, precision, recall, and F1 score are commonly used.” (Emelianov, 2024). So we can ensure that the system is performing to its intended purposes and that the system has an accurate representative of each of the departments within the system. Size of the data available. Contrarily, the size of the data can also affect the system's ability to properly learn the connections between the departments and the elements of the emails. If the system does not have enough emails or if those emails do not have enough variety. It can dramatically impair the system's ability to predict the correct department. As an example, if all the emails that the system is taking in are from the sales and marketing department. The program may incorrectly label an email for the finance department as sales. The size of the data depends on the company's historical emails that are available for the system. If the company is relatively young or has deleted its older emails, the system will need to rely solely on a low-level “bag-of-words” model to form connections between each department and will be fine-tuned once the system is installed into the system. Once the system has been installed, it will be able to take in their user actions and draw further connections.

CHAPTER 4: MACHINE LEARNING SETUP

In this chapter, the project shall explain the training stage and put it in operation. Some topics that could be considered: · Type of problem (classification, regression, outlier detection) · Type of Learning (supervised, unsupervised, reinforced) · Algorithm selection (keyword), including error function and optimization method · Overfitting and underfitting, Risks and mitigation · Bias and Variance. Risks and mitigation · Strategy and split between training data, validation data, and test data · Training ensemble and iterations. Combining algorithms. Parallel, series, cascading · Requirements of Explainability, Loss and Accuracy · Explain the Flow (according to the following chart)

REFERENCE LIST

Emelianov, David. “Utilizing Machine Learning for Efficient Email Sorting.” Trim box , 15 Jan. 2024, www.trimbox.io/blog/utilizing-machine-learning-for-efficient-email-sorting [Accessed 20 Mar. 2024]. Glosson, M. (2024). 7 Best Email Sorter Software And Apps To Organize Inbox. [online] clean.email. Available at: https://clean.email/how-to-sort-emails/best-email-sorter [Accessed 20 Mar. 2024]. McKinsey & Company (2024). What is tokenization? | McKinsey. [online] www.mckinsey.com. Available at: https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is- tokenization#/ [Accessed 20 Mar. 2024]. Microsoft (2024). AI-Powered Reply Generator. [online] AI-Powered Reply Generator. Available at: https://appsource.microsoft.com/en/product/office/wa200003032?tab=overview [Accessed 21 Mar. 2024]. Srinidhi, Sunny. “Lemmatization in Natural Language Processing (NLP) and Machine Learning.” Built-In , 15 Mar. 2023, builtin.com/machine-learning/lemmatization. [Accessed 20 Mar. 2024]. Totle. The Role of Machine Learning in Email Sorting and Prioritization , 3 Nov. 2023, www.linkedin.com/pulse/role-machine-learning-email-sorting-prioritization-totle-tjxtc. [Accessed 20 Mar. 2024]. AI-Jobs (2023). Model inference explained. [online] ai-jobs.net. Available at: https://ai-jobs.net/insights/model-inference-explained/ [Accessed 22 Mar. 2024]. Concannon, L. (2023). How to Measure Email Campaign Performance. [online] Meltwater. Available at: https://www.meltwater.com/en/blog/measure-email-campaign-performance. Iqbal, K. and Shehrayar Khan, M. (2022). Email classification analysis using machine learning techniques. [online] Emerald Insights. Available at: https://www.emerald.com/insight/content/doi/10.1108/ACI-01-2022-0012/full/html.