SERI: A Generative Chatbot Framework for Cybergrooming Prevention, Exams of Design

The development of SERI, a generative chatbot framework designed to prevent cybergrooming by generating authentic conversations between a perpetrator chatbot and a potential victim chatbot. The framework uses the T5 model for pre-training and fine-tuning, and the PJ dataset for evaluation. Cybergrooming is a serious crime involving the establishment of trust relationships with potential victims, often youth, for sexual exploitation or abuse. the challenges in developing a chatbot for this purpose due to the lack of sufficient datasets and the need for authentic conversations. The study also explores the use of machine learning algorithms and language models for detecting cybergrooming and the importance of understanding the evolving conversations between perpetrators and victims.

Typology: Exams

2021/2022

Uploaded on 09/07/2022

adnan_95
adnan_95 🇮🇶

4.3

(39)

918 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
SERI: Generative Chatbot Framework for Cybergrooming Prevention
Pei Wang, Zhen Guo, Lifu Huang, Jin-Hee Cho
Computer Science, Virginia Tech, VA, USA
{pwang1, zguo, lifuh, jicho}@vt.edu
Abstract
Cybergrooming refers to a crime to lure poten-
tial victims, particularly youth, by establish-
ing personal trust relationships with them for
sexual abuse or exploitation. Although cyber-
grooming is recognized as one of the serious
social issues, there has been a lack of proac-
tive programs to protect the youth. In this
paper, we present a generative chatbot frame-
work, called SERI (Stop cybERgroomIng),
that can generate authentic conversations be-
tween a perpetrator chatbot and a potential vic-
tim chatbot. The SERI is designed to provide
a safe and authentic environment for enhanc-
ing youth’s sensitivity and awareness of subtle
cues of cybergrooming without exposing un-
necessary ethical issues caused by potentially
offensive or upsetting languages. The SERI
is developed as a pre-stage before the perpe-
trator chatbot is deployed to chatting with an
actual human youth user to observe how the
youth user can respond to a stranger or ac-
quaintance asking for sensitive or private in-
formation. Hence, to evaluate the quality of
the conversations generated by the SERI, we
use open-source, referenced, and unreferenced
metrics to assess the generated conversations
automatically. In addition, we evaluated the
quality of the conversation based on the human
evaluation method. Our results show that the
SERI can generate authentic conversations be-
tween the two chatbots compared to the origi-
nal conversations from the used dataset in per-
plexity and MaUde scores.
1 Introduction
As of 2017, approximately one-third of online users
in the world are known young people below the age
of 18 (UNICEF,2017). Although Internet has pro-
vided countless benefits in our everyday life, it also
has introduced serious concerns in online sexual
exploitation and abuse of children (Choo,2009;
Marchenko,2017). Cybergrooming refers to the
crime of establishing a personal trust relationship
with potential victims, commonly youth, via Inter-
net only for sexual exploitation or abuse (Choo,
2009). In the US, from 1998 to 2013, the Cyber-
Tipline (on International Law and Policy,2017)
received 60,000 cases of luring children for sex-
ual purposes in cyberspace. Due to the high se-
riousness of cybergrooming, some studies have
investigated the key properties of cybergrooming
or developed tools to detect online child sexual
exploitation or predators (Anderson et al.,2019;
Bours and Kulsrud,2019;Fauzi and Bours,2020).
In computer science, the majority of cybergroom-
ing studies focused on detecting predators by ana-
lyzing malicious conversations. However, this does
not provide any proactive prevention to protect po-
tential youth victims from cybergrooming. Due
to this reason, this work is motivated to develop
a proactive cybergrooming prevention program to
increase youth’s awareness and sensitivity to cyber-
grooming and its serious consequence. To this end,
we aim to develop a generative chatbot framework
that can provide authentic conversations between
a perpetrator chatbot and a youth user chatbot to
achieve stopping cybergrooming ultimately. We
named this generative chatbot framework by SERI,
S
top cyb
ER
groom
I
ng. The SERI will be used as a
pre-stage to provide a safe and authentic environ-
ment before deploying the perpetrator chatbot with
a real human youth user. The SERI will allow a
safe environment that a youth user can involve an
authentic conversation with a stranger or acquain-
tance and learn how to deal with the person talking
about sensitive or private issues.
In developing the authentic, generative chatbot
framework, SERI, to mimic the conversations be-
tween a perpetrator and a potential victim, we
found the following
research challenges
. First,
unlike general casual talks, the perpetrator is very
goal-oriented by leading conversations and striving
to achieve a final goal, such as meeting in person.
The perpetrator gradually establishes a trust rela-
pf3
pf4
pf5
pf8

Partial preview of the text

Download SERI: A Generative Chatbot Framework for Cybergrooming Prevention and more Exams Design in PDF only on Docsity!

SERI: Generative Chatbot Framework for Cybergrooming Prevention

Pei Wang, Zhen Guo, Lifu Huang, Jin-Hee Cho

Computer Science, Virginia Tech, VA, USA

{pwang1, zguo, lifuh, jicho}@vt.edu

Abstract

Cybergrooming refers to a crime to lure poten- tial victims, particularly youth, by establish- ing personal trust relationships with them for sexual abuse or exploitation. Although cyber- grooming is recognized as one of the serious social issues, there has been a lack of proac- tive programs to protect the youth. In this paper, we present a generative chatbot frame- work, called SERI (Stop cybERgroomIng), that can generate authentic conversations be- tween a perpetrator chatbot and a potential vic- tim chatbot. The SERI is designed to provide a safe and authentic environment for enhanc- ing youth’s sensitivity and awareness of subtle cues of cybergrooming without exposing un- necessary ethical issues caused by potentially offensive or upsetting languages. The SERI is developed as a pre-stage before the perpe- trator chatbot is deployed to chatting with an actual human youth user to observe how the youth user can respond to a stranger or ac- quaintance asking for sensitive or private in- formation. Hence, to evaluate the quality of the conversations generated by the SERI, we use open-source, referenced, and unreferenced metrics to assess the generated conversations automatically. In addition, we evaluated the quality of the conversation based on the human evaluation method. Our results show that the SERI can generate authentic conversations be- tween the two chatbots compared to the origi- nal conversations from the used dataset in per- plexity and MaUde scores.

1 Introduction

As of 2017, approximately one-third of online users in the world are known young people below the age of 18 (UNICEF, 2017). Although Internet has pro- vided countless benefits in our everyday life, it also has introduced serious concerns in online sexual exploitation and abuse of children (Choo, 2009; Marchenko, 2017). Cybergrooming refers to the crime of establishing a personal trust relationship

with potential victims, commonly youth, via Inter- net only for sexual exploitation or abuse (Choo, 2009). In the US, from 1998 to 2013, the Cyber- Tipline (on International Law and Policy, 2017) received 60,000 cases of luring children for sex- ual purposes in cyberspace. Due to the high se- riousness of cybergrooming, some studies have investigated the key properties of cybergrooming or developed tools to detect online child sexual exploitation or predators (Anderson et al., 2019; Bours and Kulsrud, 2019; Fauzi and Bours, 2020). In computer science, the majority of cybergroom- ing studies focused on detecting predators by ana- lyzing malicious conversations. However, this does not provide any proactive prevention to protect po- tential youth victims from cybergrooming. Due to this reason, this work is motivated to develop a proactive cybergrooming prevention program to increase youth’s awareness and sensitivity to cyber- grooming and its serious consequence. To this end, we aim to develop a generative chatbot framework that can provide authentic conversations between a perpetrator chatbot and a youth user chatbot to achieve stopping cybergrooming ultimately. We named this generative chatbot framework by SERI, Stop cybER groomI ng. The SERI will be used as a pre-stage to provide a safe and authentic environ- ment before deploying the perpetrator chatbot with a real human youth user. The SERI will allow a safe environment that a youth user can involve an authentic conversation with a stranger or acquain- tance and learn how to deal with the person talking about sensitive or private issues. In developing the authentic, generative chatbot framework, SERI, to mimic the conversations be- tween a perpetrator and a potential victim, we found the following research challenges. First, unlike general casual talks, the perpetrator is very goal-oriented by leading conversations and striving to achieve a final goal, such as meeting in person. The perpetrator gradually establishes a trust rela-

tionship with a potential victim by asking a series of questions about the potential victim’s private life. Second, it is highly challenging to develop a chatbot generating authentic conversations be- cause of a lack of datasets that sufficiently train the SERI. The only available dataset is the Per- verted Justice (PJ) dataset (Perverted Justice Foun- dation Inc., 2020), consisting of the conversations between cybergrooming perpetrators and profes- sionally trained volunteers playing the role of po- tential youth victims. However, the volume of the PJ dataset is limited (i.e., 100 sets of conversations) and contains highly informal languages, such as short abbreviations, slang, or unsegmented words, emojis, or URLs. The poor quality of the training datasets makes it significantly challenging to train the SERI chatbot model directly. To tackle these, we made the following key contributions via our developed SERI:

  1. We applied a two-stage paradigm to train the SERI by the T5 (Text-to-Text Transfer Trans- former) model (Raffel et al., 2020), where both the perpetrator and victim chatbots were first pre-trained on general and large-scale causal talk datasets, such as ConvAI2 (The Second Conversational Intelligence Challenge dataset) (Dinan et al., 2019). After then, the chatbots were fine-tuned on the PJ dataset, which was pre-processed with a series of social text normalization tools to mitigate the effect of highly informal languages (e.g., slang, online abbreviations, unsegmented words, emojis, or URLs).
  2. We modeled the multi-stage strategies that the perpetrators can take to evolve the relationship with a potential victim and achieve the goal of meeting in person. To achieve this, we defined four grooming stages based on the evolution of the relationships and predicted a stage for each utterance by encoding an utterance through BERT (Bidirectional Encoder Representations from Transformers) (Devlin et al., 2019). We accordingly trained four perpetrator subchatbots using the T5 model. Each perpetrator’s subchat- bot was trained on the set of utterances from the PJ dataset. The corresponding stage labels of utterances were predicted by a BERT-based stage classifier.
  3. We developed a mechanism to escalate the at- tack stages and coordinate the dialogue genera-

tion with the four perpetrator subchatbots. The perpetrator will move to a next-level stage by switching from the current subchatbot to the next stage subchatbot if the perpetrator success- fully obtains all information from a potential victim while the potential victim still stays in the conversation. If the potential victim leaves the chat, the perpetrator will fail this attack.

  1. We evaluated the SERI by using both referenced metrics (i.e., BLEU (Post, 2018), ROUGE (Lin, 2004), and BERTScores (Zhang* et al., 2020)) and unreferenced metrics (i.e., perplexity and MaUde scores (Sinha et al., 2020)). We found that the conversations generated by the SERI showed better performance than the ground truth conversations based on all metrics above. In addition, our human evaluation confirms that about 37% of the utterances generated by the SERI are valid and better than ground truth ut- terances from the PJ dataset.

2 Related Work

Cybergrooming detection. Many Machine Learn- ing (ML) algorithms, such as support vector ma- chine (SVM) (Anderson et al., 2019; Dhouioui and Akaichi, 2016; Fauzi and Bours, 2020; Gu- nawan et al., 2018), fuzzy logic (Anderson et al., 2019), k-nearest neighbors (KNN) (Gunawan et al., 2018), Random Forest (Fauzi and Bours, 2020), Naïve Bayes (Bours and Kulsrud, 2019), Decision Tree (Fauzi and Bours, 2020) and Neural Network (NN) classifiers (Bours and Kulsrud, 2019; Fauzi and Bours, 2020), have been used to detect cyber- grooming based on lexical (e.g., Term Frequency- Inverse Document Frequency or TF-IDF based fea- tures, Bag of Words features) and behavioral fea- tures from the text. To understand the evolving conversations between perpetrators and victims, researchers also investigated multiple relational stages of cybergrooming (Winters and Jeglic, 2016). However, while most efforts focused on the groom- ing stages and prevention methods (Zambrano et al., 2019), no prior research has characterized the fea- tures of victims by cybergrooming. Chatbot application tools. A chatbot, called Ne- gobot, was developed to detect potential pedophiles in the social networks (Laorden et al., 2013). A game-theoretic reward metric could move the chat- bot toward the next conversation stage or main- tain the current stage. Recently, pre-training lan-

Figure 2: A sample training unit for the perpetrator and pseudo-user (i.e., potential victim) chatbots.

refined the six stages in (Zambrano et al., 2019) to develop four new stages for the grooming pro- cess. We summarize the key conversation contents covered by each stage in Table 1. We can simplify the six stages by merging similar original stages. That is, we merge stages s 1 and s 4 as new stage ˜s 1 , stages s 2 and s 3 as new stage s˜ 2 , stage s 5 as new stage s˜ 3 , and stage s 6 as new stage ˜s 4. In the end, each utterance in the PJ dataset can obtain a label from the four new stages based on the BERT classifier.

Pre-training the chatbots on the ConvAI dataset. For each role of the perpetrator and poten- tial victim, we build a chatbot model using T5 (Raf- fel et al., 2020) with the PyTorch framework. As the in-domain PJ dataset is small, to improve the fluency of the generated conversations, we first pre-train T5 with the large-scale ConvAI2 dataset, which contains high-quality general conversations.

To train the T5-based chatbots, we concatenate two dialogue turns (i.e., four sentences) as a unit, take the last sentence as the target one (i.e., ground truth response), and treat the preceding sentences as the sources (i.e., dialogue history). Figure 2 shows a sample chatbot training unit of four sentences. The conversations in the ConvAI2 dataset are usu- ally between two persons, where the one initiating the conversation plays a leading role with more leading topics or questions. Since this leading role matches a perpetrator’s nature, for all conversations in the ConvAI2 dataset, we treat the leading person as the perpetrator and the other one as the poten- tial victim and formulate the training utterances accordingly. Following (Raffel et al., 2020), given an input sequence x as the source, we generate the

response by optimizing the following objective:

L = −

i

log P (yi|yi−k,... , yi− 1 ; x; Θ), (2)

where Θ denotes the set of parameters in the T5, and yi is the i-th token of the target response. We pre-train the perpetrator and the potential victim chatbots separately on the ConvAI2 dataset. We observe that the perpetrator chatbot tends to generate more leading dialogues while the potential victim chatbot generates response messages more consistently. Fine-tuning the chatbots on the PJ dataset. A perpetrator usually follows the four grooming stages, as shown in Table 1, to gradually obtain trust from the potential victim and achieve the fi- nal cybergrooming goal progressively. To model the perpetrator’s responses at the four stages, we fine-tune the four subchatbots for the perpetrator based on the in-domain PJ dataset. To obtain the messages for each stage, we cut conversations in the PJ dataset into several blocks and assign a stage for each block based on the criteria in Table 2. The connection strength of a block from the pre- vious utterances is crucial to determine each block locus and improve the quality of training of each stage. To split the conversations into blocks, we estimate two types of connectivity from the pre- trained BERT next sentence prediction model (De- vlin et al., 2019): (1) The connectivity score, g 1 , between each utterance and the last utterance from the perpetrator; and (2) The connectivity score, g 2 , between each utterance and the last utterance from the victim. Thus, the connectivity between each ut- terance and the previous contexts is represented by g 1 + g 2. Furthermore, the beginning of each block is refined by comparing the connectivity scores to three utterances: The first utterance of the current block and its two previous utterances from the per- petrator. We use the utterance with the minimum of g 1 + g 2 as the new beginning of the block. This way allows us to refine the beginning of all the blocks and obtain four groups of blocks for the four stages. We fine-tune the four perpetrator sub- chatbots on the four groups of blocks separately. Further, we fine-tune a victim chatbot based on the victim utterances from the PJ dataset. Finally, to generate consistent and high-quality (i.e., human-like) conversations, we allow each chatbot to generate five candidate messages at each time and select the best one based on their connec-

Stages Label Distribution of Each Block ˜s 1 More than 80% utterances are labeled as ˜s 1 ˜s 2 More than 60% utterances are labeled as ˜s 2 ˜s 3 More than 50% utterances are labeled as ˜s 3 ˜s 4 More than 40% utterances are labeled as ˜s 4

Table 2: Conversation segmentation criteria for the four relationship stages.

tivity scores to the previous message. The connec- tivity scores are computed based on the pre-trained BERT next sentence prediction model and used to ensure the consistency of a generated message with the context earlier. Stage evolution of the perpetrator subchatbot. We design a cybergrooming stage evolution for the chatbots by observing whether the conversation of each stage maintains a certain number of rounds (e.g., 20). If the conversation of stage s˜ 1 lasts 20 rounds between the perpetrator and victim chatbots, the perpetrator will move to stage s˜ 2. Once the victim detects the perpetrator’s grooming intent, he/she will leave the chat conversation immediately and the current stage lasts less than 20 rounds.

Parameter Value Parameter Value Learning rate (lr) 5 e−^5 Epochs 4 Epsilon (ε) 1. 0 e−^6 Batch size 8 Warmup steps 500 GPU Yes Early stopping 0 Vocabulary T5-base

Table 3: Parameters and their default values used for the SERI framework.

4 Experiment Setup

Datasets. We trained our chatbots using two chat- log datasets. The ConvAI2 dataset (Dinan et al.,

  1. is a two-person casual chat dataset in JSON format with several different repeated labels. The sentences with the “history” label fit best for our task. Hence, we use the 2,000 dialogues with more than 60K utterances from the ConvAI2. We man- ually downloaded the PJ dataset from the PJ web- site 1 in HTML format. It contains 100 dialogues with more than 100K chat records between per- petrators and professionally trained volunteering undercover police officers mimicking potential vic- tims 2. We randomly divided the PJ dataset into (^1) http://www.perverted-justice.com/ ?archive=byUserVotes (^2) http://www.perverted-justice.com/ index.php?pg=policeinfo

Role BLEU ROUGE BERTScore Max:100 Max:1 Max: Perpetrator 2.9906 0.0970 0. Victim 2.6884 0.1063 0.

Table 4: BLEU, ROUGE, and BERTScore-based anal- ysis for the conversations generated by the SERI.

train set, valid set, and test set with a ratio of 8:1:1. Table 3 summarizes the key parameters of the T model. Data cleaning. The ConvAI2 dataset is well- organized and ready for our chatbots training. How- ever, the PJ dataset contains a lot of noises, such as URLs, Hashtags, Mentions, or Emojis. We re- moved the noises by regular expressions in Python library ‘Preprocessor.’ There are repeated occur- rences of informal languages, such as lexical slangs and consecutive words without spaces. To segment consecutive words with spaces, we applied ‘word- segment’ library in Python. Lexical slangs can be normalized with a state-of-the-art lexical nor- malization model, called MoNoise (van der Goot, 2019). Metrics. To evaluate the performance of our chat- bot, we use both referenced and unreferenced metrics for evaluating automatic dialogues (Finch and Choi, 2020). For referenced metrics, we use BLEU (Post, 2018), ROUGE (Lin, 2004), and BERTScore (Zhang* et al., 2020) to evaluate the quality of the chatbot generated utterances by com- paring them against the ground truth from the PJ dataset. For unreferenced metrics, we use perplex- ity and MaUde scores (Sinha et al., 2020). Per- plexity is to measure how easily a sentence can be understood and lower perplexity indicates higher fluency. MaUde measures multiple aspects of qual- ity in languages in terms of fluency, reasonableness (i.e., logical flow), or avoiding repetition. We com- pare the scores of perplexity and MaUde under both the ground truth utterances from the PJ dataset and the conversations generated by our proposed SERI. We also conducted human evaluation by ran- domly selecting 200 conversation samples where each sample contains 4 history utterances and 2 target utterances (i.e., the original utterance from the PJ dataset and an utterance generated by the SERI). For each sample, we ask two graduate stu- dents and one NLP expert to compare the two target utterances and select which one is more valid and consistent with the history utterances than the other.

Ethical Statement

Our goal in developing the SERI is to simulate the authentic conversations between perpetrators and potential victims, especially human youth users. A general approach to ensure proper rather than malicious application should incorporate ethical considerations as the first order principles in each step of the system design. In this paper, we fo- cus on developing a chatbot approach to educate youth users by increasing their awareness and sen- sitivity to cybergrooming and its consequence and accordingly protect them from cybergrooming. We acknowledge the pros and cons of releasing details of the SERI. Here we provide some example sce- narios where the SERI should or should not be used:

  • Should-Do: Educational parties use the SERI to develop curricula to educate youth in terms of how to respond to online abusive messages and avoid cybergrooming when a youth has a chance to have online conversations with a stranger or acquaintance talking about sexually sensitive or private information.
  • Should-Do: Parents who want to learn groom- ing conversations to educate their children to be resistant and resilient against the potential risk of encountering sexual predators.
  • Should-Not-Do: Anyone using the SERI as a tool for online sexual exploitation or abuse of children.

Besides the above regulations that we will use to ensure the properly and ethically use of SERI, we will also design several strategies to prevent the misuse and its adverse influence:

  • First, part of the adverse influence and ethical concerns of SERI lies in the sensitive and inap- propriate languages used by the chatbots. To mit- igate this issue, we will design approaches and leverage linguistic resources, such as the profane lexicons^3 , to replace filthy words in the train- ing dataset with moderate ones and balance be- tween simulating a realistic cybergrooming sce- nario and avoiding any potential ethical issues or bad influence to youths.
  • Instead of releasing the source code and models of SERI to the public, we will make them to be (^3) https://www.cs.cmu.edu/~biglou/ resources/

accessible only to parties for research purposes by request.

  • When delivering SERI as an education program, we will only include the perpetrator chatbot and allow youths to chat with it. We will design approaches to monitor the language generated by the chatbot and stop the conversation by the monitoring system or the users whenever filthy language is detected. This will prevent the SERI from being misused by a bad party as the SERI will stop working when the user is detected as an adult or potential perpetrator. Finally, the conversational data will be encrypted and stored under the regulations and standards stated in the legal frameworks, such as GDPR^4.

References

P. Anderson, Z. Zuo, L. Yang, and Y. Qu. 2019. An intelligent online grooming detection system us- ing AI technologies. In 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–6.

P. Bours and H. Kulsrud. 2019. Detection of cyber grooming in online conversation. In 2019 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Ka- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas- try, A. Askell, et al. 2020. Language models are few- shot learners. arXiv preprint arXiv:2005.14165.

K. R. Choo. 2009. Online child grooming: A literature review on the misuse of social networking sites for grooming children for sexual offences, volume 103. Canberra: Australian Institute of Criminology.

J. Devlin, M. Chang, K. Lee, and K. Toutanova. 2019. Bert: Pre-training of deep bidirectional transform- ers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.

Z. Dhouioui and J. Akaichi. 2016. Privacy pro- tection protocol in social networks based on sex- ual predators detection. In Proceedings of the International Conference on Internet of Things and Cloud Computing, ICC’16, New York, NY, USA. Association for Computing Machinery.

E. Dinan, V. Logacheva, V. Malykh, A. Miller, K. Shus- ter, J. Urbanek, D. Kiela, A. Szlam, I. Serban, R. Lowe, et al. 2019. The second conversational intelligence challenge (ConvAI2). arXiv preprint arXiv:1902.00098. (^4) https://gdpr-info.eu/

M. A. Fauzi and P. Bours. 2020. Ensemble method for sexual predators identification in online chats. In 2020 8th International Workshop on Biometrics and Forensics (IWBF), pages 1–6. IEEE.

S. E. Finch and J. D. Choi. 2020. Towards uni- fied dialogue system evaluation: A comprehensive analysis of current evaluation protocols. CoRR, abs/2006.06110.

F. E. Gunawan, L. Ashianti, and N. Sekishita. 2018. A simple classifier for detecting online child grooming conversation. TELKOMNIKA, 16(3):1239–1248.

C. Laorden, P. Galán-García, I. Santos, B. Sanz, J. M. Hidalgo, and P. G. Bringas. 2013. Negobot: A con- versational agent based on game theory for the de- tection of paedophile behaviour. In International Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special Sessions, pages 261–270. Springer.

Mike Lewis, Yinhan Liu, Naman Goyal, Mar- jan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer.

  1. BART: Denoising sequence-to-sequence pre-training for natural language generation, trans- lation, and comprehension. arXiv preprint arXiv:1910.13461.

C. Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. As- sociation for Computational Linguistics.

S. Marchenko. 2017. Web of darkness: Groomed, ma- nipulated, coerced, and abused in minutes.

Koons Family Institute on International Law and Pol- icy. 2017. Online Grooming of Children for Sexual Purposes: Model Legislation & Global Review. In- ternational Centre for Missing and Exploited Chil- dren.

Perverted Justice Foundation Inc. 2020. Perverted- justice.com archives.

M. Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186– 191, Belgium, Brussels. Association for Computa- tional Linguistics.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. 2018. Improving language understand- ing by generative pre-training.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsuper- vised multitask learners. OpenAI blog, 1(8):9.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the lim- its of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.

K. Sinha, P. Parthasarathi, J. Wang, R. Lowe, W. L. Hamilton, and J. Pineau. 2020. Learning an unrefer- enced metric for online dialogue evaluation. ACL. A. M. Turing. 2009. Computing machinery and intel- ligence. In Parsing the Turing Test, pages 23–65. Springer. UNICEF. 2017. The state of the world’s chidren 2017: Children in a digital world. R. van der Goot. 2019. MoNoise: A multi-lingual and easy-to-use lexical normalization tool. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 201–206, Florence, Italy. As- sociation for Computational Linguistics. G. Winters and E. Jeglic. 2016. Stages of sexual groom- ing: Recognizing potentially predatory behaviors of child molesters. Deviant Behavior, pages 1–10. T. Wolf, V. Sanh, J. Chaumond, and C. Delangue. 2019. TransferTransfo: A transfer learning approach for neural network based conversational agents. CoRR, abs/1901.08149. P. Zambrano, J. Torres, L. Tello-Oquendo, R. Jácome, M. E. Benalcázar, R. Andrade, and W. Fuertes. 2019. Technical mapping of the grooming anatomy using machine learning paradigms: An information secu- rity approach. IEEE Access, 7:142129–142146. T. Zhang, V. Kishore, F. Wu*, K. Q. Weinberger, and Y. Artzi. 2020. BERTScore: Evaluating text gener- ation with BERT. In International Conference on Learning Representations. Y. Zhang, S. Sun, M. Galley, Y. Chen, C. Brock- ett, X. Gao, J. Gao, J. Liu, and B. Dolan. 2019. DialoGPT: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.