













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The novel task of automatically generating headlines for detailed online math questions. it introduces a deep learning model, mathsum, designed to jointly model textual and mathematical information, capturing both semantic and structural features of equations. The model's performance is evaluated on two large datasets, demonstrating its ability to generate concise and informative headlines that accurately summarize complex mathematical questions. This research contributes significantly to the field of natural language processing and mathematical information retrieval.
Typology: Cheat Sheet
Uploaded on 04/29/2025
1 / 21
This page cannot be seen from the preview
Don't miss anything!














Ke Yuan, Dafang He, Zhuoren Jiang, Liangcai Gao, Zhi Tang, C. Lee Giles arXiv (arXiv: 1912.00839v1) Generated on April 29, 2025
Automatic Generation of Headlines for Online Math Questions
Mathematical equations are an important part of dissemination and communication of scientific information. Students, however, often feel challenged in reading and understanding math content and equations. With the development of the Web, students are posting their math questions online. Nevertheless, constructing a concise math headline that gives a good description of the posted detailed math question is nontrivial. In this study, we explore a novel summarization task denoted as geNerating A concise Math hEadline from a detailed math question (NAME). Compared to conventional summarization tasks, this task has two extra and essential constraints: 1) Detailed math questions consist of text and math equations which require a unified framework to jointly model textual and mathematical information; 2) Unlike text, math equations contain semantic and structural features, and both of them should be captured together. To address these issues, we propose MathSum, a novel summarization model which utilizes a pointer mechanism combined with a multi-head attention mechanism for mathematical representation augmentation. The pointer mechanism can either copy textual tokens or math tokens from source questions in order to generate math headlines. The multi-head attention mechanism is designed to enrich the representation of math equations by modeling and integrating both its semantic and structural features. For evaluation, we collect and make available two sets of real-world detailed math questions along with human-written math headlines, namely EXEQ-300k and OFEQ-10k. Experimental results demonstrate that our model (MathSum) significantly outperforms state-of-the-art models for both the EXEQ-300k and OFEQ-10k datasets. Automatic Generation of Headlines for Online Math Questions Ke Yuan1 2, Dafang He2, Zhuoren Jiang3, Liangcai Gao1 , Zhi Tang1 , C. Lee Giles 1Wangxuan Institute of Computer Technology, Peking University, Beijing, 100080, China 2The Pennsylvania State University, University Park, PA 16802, USA 3School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China [email protected], [email protected], [email protected], fglc, [email protected], [email protected] Abstract Mathematical equations are an important part of dissemina- tion and communication of scientific information. Students,
for parabolic equations using the Galerkin approximation I encountered the following problem: Assume that is an open set and is an orthonormal basis of such that l is also orthogonal in. For every let be the -orthogonal projection onto , i.e. It is clear that for every and. However, what I need is the following: I'm not even sure it is true, but I need it to obtain some a priori estimates. I'll appreciate any help. Math Headline: Orthogonal projection in andL2(Ω)W1,20(Ω)Ω ⊆ℝd{wk}∞k=1L2(Ω){wk}∞k= W1,20(Ω)n ∈ℕPnL2span{wk}nk=1Pn(u)=n∑k=1(u,wk)L2(Ω)wk=n∑k=1∫Ωu(x)wk(x)dxwk,u ∈ L2(Ω). Pn(u) L2(Ω)≤ u L2(Ω)n∥ ∥ ∥ ∥ ∈ℕ ∈u L2(Ω) C>;0 ∃ ∀ ∈ℕ∀ ∈n u W1,20(Ω): Pn(u) W1,20(Ω)≤C ∥ ∥ ∥ ∥ u W1,20(Ω).Complex and LongClear and BriefFigure 1: Example of a detailed math question along with its headline. The question is complex and long and the headline is clear and brief. From the viewpoint of questioners, the contents of detailed math questions are usually complex and long. In order to efficiently help those who pose the question, it would be helpful to have a headline which is concise and to the point. Correspondingly, those who will answer the question (an- swerers) also need a clear and brief headline to quickly de- termine if they should bother to respond. Therefore, giving a concise math headline to a detailed question is important and meaningful. Figure 1 illustrates an example of the ques- tion along with its headline posted in Mathematics Stack Ex- change3. It’s clear that, a complicated question can make it difficult for answerers to understand the intent of the ques- tioner, while a concise headline can effectively reduce the cost of this operation. To this end, we explore a novel approach for ge Nerating A M ath h Eadline for detailed questions ( NAME ). Here, we define the NAME task as a summarization task. Com- pared to conventional summarization tasks, the NAME task has two extra essential issues that need to be addressed:
fied framework. For instance, Yasunaga and Lafferty (Ya- sunaga and Lafferty 2019) attempted to utilize both text and mathematical representations, but both were treated as sep- arate components. We argue that this approach loses much crucial information, e.g., the position and the semantic de- pendency between text and equations. 2) Capturing seman- tic and structural features of math equations synchronously. Unlike text, math equations not only contain semantic fea- tures, but also structural features. For instance, equation “f=a b” and “fb=a” have the same semantic fea- tures, but different structural features. However, most ex- isting research separately considers only one of these two characteristics. For instance, this work (Yuan et al. 2016; Zanibbi et al. 2016) only considered the structural infor- mation of equations for mathematical information retrieval tasks while other work (Deng et al. 2017; Yasunaga and Laf- ferty 2019) treated a math equation as basic symbols and modeled them as text, which led to structural features loss. To address these issues, we propose MathSum, a novel method that combines pointers with multi-head attention for mathematical representation augmentation. The pointer mechanism can either copy textual tokens or math tokens from source questions in order to generate math headlines. The multi-head attention mechanism is designed to enrich the representation of each math equation separately by mod- eling and integrating both semantic and structural features. For evaluation, we construct two large datasets (EXEQ-300k and OFEQ-10k) which contain 290,479 and 12,548 detailed questions with corresponding math headlines from Mathe- matics Stack Exchange andMathOverflow , respectively. We compare our model with several abstractive and extractive baselines. Experimental results demonstrate that our model significantly outperforms several strong baselines on the NAME task. In summary, the contributions of our work are: an innovative NAME task for generating a concise math headline in response to giving a detailed math question. a novel summarization model MathSum that ad- dresses the essential issues of the NAME task, in
ods (Mihalcea and Tarau 2004; Nishikawa et al. 2014) ex- tract sentences from the original document to form the sum- mary. Abstractive methods (See, Liu, and Manning 2017; Tan, Wan, and Xiao 2017a; Narayan, Cohen, and Lapata 2018; Gavrilov, Kalaidin, and Malykh 2019) aim at gener- ating the summary based on understanding the document. We view headline generation as a special type of sum- marizaton, with the constraint that only a short sequence of words is generated and that it preserves the essential meaning of a math question document. Recently, head- line generation methods with end-to-end frameworks (Tan, Wan, and Xiao 2017b; Narayan, Cohen, and Lapata 2018; Zhang et al. 2018; Gavrilov, Kalaidin, and Malykh 2019) achieved significant success. Math headline generation is similar to existing headline generation tasks, but still dif- fers in several aspects. The major difference is that a math headline consists of text and math equations which require jointly modeling and inferring text and math equations. Datasetsavg. math num avg. text tokens avg. math tokens avg. sent. num text vocab. size math vocab. size ques. headl. ques. headl. ques. headl. ques. headl. ques. headl. headl. ques. EXEQ-300k 6.08 1.72 60.65 7.72 12.27 9.91 4.68 1.52 84,272 21,568 1,049 663 OFEQ-10k 8.56 1.41 105.92 8.61 10.04 6.84 6.53 1.40 25,733 6,721 581 393 Table 1: Statistics of the EXEQ-300k and OFEQ-10k (where avg. math num = average math equation number; avg. text tokens = average textual token number; avg. math tokens = average math equation token number; avg. sent. num = average sentence number; text vocab. size = text vocabulary size; math vocab. size = math vocabulary size; ques. = detailed question (source); headl. = math headline (target)). datasets question pairs correct question pairs EXEQ-300k 346,202 290, OFEQ-10k 13,408 12, Table 2: Statistics of two datasets (EXEQ-300k and OFEQ- 10k) with respect to overall number of collected question pairs and the number of correct question pairs. Figure 2: Proportion of novel n-grams for the gold standard math headlines in EXEQ-300k and OFEQ-10k. Task and Dataset Task Definition
Let us define the NAME task as a summarization one. Let S= (s0;s1;:::;s N)denote the sequence of the input de- tailed question. Nis the number of tokens in the source, s2fsw;seg,swrepresents the textual token (word), and seindicates the math token5. For each inputS, there is a corresponding output math headline with MtokensY= (y0;y1;:::;y M)wherey2fyw;yegandyw,yeare textual tokens and math tokens, respectively. The goal of NAME is to generate a math headline learned from the input question, namely,S!Y. Dataset Since this NAME task is new, we could find no public benchmark dataset. As such, we build two real-world math datasets, EXEQ-300k (from Mathematics Stack Exchange ) 5Math token is the fundamental element which can form a math equation(Deng et al. 2017)and OFEQ-10k (from MathOverflow ), for model training and evaluation. Both datasets consist of detailed questions with corresponding math headlines. In EXEQ-300k and OFEQ-10k, each question is written in detailed math, and the corresponding headline is a human- written question summary with math equations, typically by the questioner. In Mathematics Stack Exchange andMath- Overflow , math equations are enclosed by the “$$” symbols. We use in our datasets “
learns to generate headlines from the learned representation. For the encoder, the crucial issue is to build effective rep- resentations for tokens in an input question. As mentioned in NAME task, there are two different token types (i.e., textual and math) and their characteristics are intrinsically different. Math tokens not only contain the semantic features (mathe- matical meaning) but also the structural features (e.g., su- per/sub script, numerator/denominator, recursive structure). Therefore, the representation learning should vary according to the token type. In this study, we introduce a multi-head attention mechanism to enrich the representation of math to- kens. The tokensiof the input question Sis first converted into a continuous vector representation si, so that the vector rep- resentation of the input is S= [s0;:::;sN]whereNis the number of tokens in the input and sw,seare vector rep- resentation of textual and math tokens, respectively. Then the vectors of math tokens within an equation are fed into a block with multi-head attention (Vaswani et al. 2017) which then enriches its representation by considering both its se- mantic and structural features. Please note that each equa- tion in the input will be separately fed into the block since an equation is a fundamental unit for characterizing the se- mantic and structural features of a series of math tokens. LetMk=fse j;:::;se j+mgdenote the initial vector representa- tion of thek-th math equation with mmath tokens as input. Then the multi-head attention block transforms the se ito its enriched representation se i. This is calculated by se i=fMultihead(se i;[se j;:::;se j+m]);i2fj;::;j +mg(1) wherefMultihead is the multi-head attention block. jis the beginning index of math equation Mkandj+mis the end index. After that, the enriched vector representation of the input isS0= [s
0;:::;s N]where s02fsw;segis fed into the up- date layer (a single-layer bidirectional LSTM) one-by-one. The hidden state hiis updated according to the previous hid- den statehi1and current token vector s i, hi=f(hi1;s i) (2) wherefis the dynamic function of LSTM unit and hiis the hidden state of token s0in the stepi. In the decoder, we aggregate the encoder hidden states h0;:::;h Nusing a weighted sum that then becomes the con- text vectorCt: Ct=X i ithi (3) where t= softmax(e t) eit= Ttanh(W hhi+ Wh0h t+ battn)(4) ,Wh,Wh0andbattn are the learnable parameters. h tis the hidden state of the decoder at time step t. The attention is the distribution over the input position. At this point, the generated math headline may con- tain textual tokens or math tokens from the source which could be out-of-vocabulary. Thus, we utilize a pointer net- work (See, Liu, and Manning 2017) to directly copy tokens from source. Considering that the token wmaybe copied from the source or generated from the vocabulary, we use the copy probability pcas a soft switch to choose copied tokens from the input or generated textual tokens from the vocabulary. p(yt=wjS;y<t) =pcX i:wi=w it+ (1pc)f(h t;Ct) pc=f(Ct;h t;xt) (5) wherefis non-linear function and xtis the decoder input at timestept.
tention block contains 4 heads and 256-dimensional hidden states for the feed-forward part. The model is trained using AdaGrad (Duchi, Hazan, and Singer 2011) with a learning rate of 0.2, an initial accumulator value of 0.1, and a batch size of 16. Also, we set the dropout rate as 0.3. The vocabu- lary size of the question and headline are both 50,000. In ad- dition, the encoder and decoder share the token representa- tions. At test time, we decode the math headline using beam search with beam size of 3. We set the minimum length as 20 tokens on EXEQ-300k and 15 tokens on OFEQ-10k. We implement our model in PyTorch and train on a single Titan X GPU. Experimental Results Quantity Performance Metrics Here we use three standard metrics: ROUGE (Lin 2004), BLEU (Papineni et al. 2002) and METEOR (Denkowski and Lavie 2014) for evaluation. The ROUGE metric measures the summary quality by counting the overlapping units (e.g., n-gram) between the generated summary and reference summaries. We report the F1 scores for R1 (ROUGE-1), R2 (ROUGE-2), and RL (ROUGE-L). The BLEU score is a widely used as an accuracy measure for machine translation and computes the n-gram precision of a candidate sequence to the reference. METEOR is recall-oriented and evaluates translation hypotheses by aligning them to reference translations and calculating sentence-level similarity scores. The BLEU and METEOR scores are calculated by using nlg-eval11package, and ROUGE scores are based on rouge-baselines package. We use the edit distance and exact match to check the similarity of the generated equations compared with the gold standard equations in the math headlines. These two metrics are widely used for the evaluation of equation generation (Deng et al. 2017; Wu et al. 2018). Edit dis- tance quantifies how dissimilar two strings are by count- ing the minimum number of operations required to trans- form one string into the other. Based on Nsamples in the test set, we use two types of edit distance. One is Edit Dis- tance(m) which is math-level dissimilar score and is de- fined asEditDistance (m) =PN
i=0minMd i max(jPij;jGij), where minMd is the minimum edit distance between equations in the generated headline and the gold standard headline, jPijandjGijare the number of equations in the i-th gen- erated headline and gold headline. The other Edit Dis- tance(s) is the sentence-level dissimilar score, and is formu- lated asEditDistance (s) =PN i=0minMd i N. Exact Match checks the exact match accuracy between the gold standard math tokens and generated math tokens and is calculated as ExactMatch =PN i=0(PMi&GMi) N, wherePMiandGMi are the sets of math tokens in the i-th generated headline and gold standard headline. 11https://github.com/Maluuba/nlg-eval 12https://github.com/sebastianGehrmann/rouge-baselines ModelsEXEQ-300k OFEQ-10k R1 R2 RL BLEU-4 METEOR R1 R2 RL BLEU-4 METEOR Random 31.56 21.35 28.99 24.32 23.40 22.95 11.48 19.85 13.19 18. Tail 22.55 14.69 20.76 22.23 23.78 15.46 7.03 13.36 11.13 11. Lead 42.23 31.30 39.29 29.89 31.61 27.68 14.92 24.07 14.56 20. TextRank 42.19 30.85 38.99 28.29 31.78 29.66 16.41 25.59 14.20 23. Seq2Seq 52.14 38.33 49.00 42.20 30.65 38.64 23.42 35.24 27.67 25. PtGen 53.26 39.92 50.09 44.10 31.76 40.27 25.30 36.51 28.07 25. Transformer 54.49 40.57 50.90 45.79 32.92 40.54 24.36 36.39 28.82 25. MathSum 57.53 45.62 54.81 52.00 37.47 42.44 28.15 38.99 29.44 26. Table 3: Comparison of different models on the EXEQ-300k and OFEQ-10 test sets for F1scores of R1 (ROUGE-1), R (ROUGE-2), RL (ROUGE-L), BLEU-4, and METEOR. ModelsEXEQ-300k OFEQ-10k Edit Distance(m) Edit Distance(s) Exact Match Edit Distance(m) Edit Distance(s) Exact Match Random 8.76 21.84 9.29 7.20 17.73 5. Tail 9.42 20.89 6.65 7.30 14.45 3. Lead 7.47 20.27 12.39 6.58 17.75 6. TextRank 7.68 21.36 12.68 6.75 20.27 7. Seq2Seq 6.68 13.57 13.26 8.69 16.78 8. PtGen 6.59 13.43 13.60 8.06 15.56 8.
Sum gets the best performance for Exact Match and sec- ond best performance (slightly weaker than Transformer) for Edit Distance(m) and Edit Distance(s). A possible rea- son is that in OFEX-10k, the lengths of math equations in source questions are usually long, while the ones in head- lines are often short. Compared to the Transformer, the copying mechanism could cause MathSum to copy long equations from the source questions, which may result in a slight decreased performance for Edit Distance(m) and Edit Distance(s) metrics. Quality Analysis Jointly modeling quality The heatmap in Figure 4 visual- izes the attention weights from MathSum. Figure 4(a) com- pares the source detailed question with its human-written math headline and the generated math headline from Math- Detailed Question: In we define coordinate triangle to be the one with sides and. How would you define its interior? What kind of equation should it satisfy? ℂℙ 2{x0=0},{x1=0}{x2=0}Human Written Math Headline: interior of a triangle in CP2MathSum Generated Math Headline: interior of coordinate triangle in. ℂℙ2(a) An example of detailed question (b) Attention weights for partial source detailed question tokens Figure 4: Heatmap of attention weights for source detailed questions. MathSum learns to align key textual tokens and math tokens with the corresponding tokens in the source question. Sum. As Figure 4 shows, there are both textual tokens and math tokens in the generated headline. Note that both math tokens and textual tokens can be effectively aligned to their corresponding tokens in the source. For instance, the textual tokens “coordinate”, “triangle” and the math tokens “ P”, “C” are both all successfully aligned. Case study To gain an insightful understanding regarding the generation quality of our method, we present three typ- ical examples in Table 5. The first two are selected from EXEQ-300k13;14and the last one is selected from OFEQ- 10k15. From the examples, we see that the generated head- lines and the human-written headlines have comparability and similarity. Generally, the generated headlines are coher- ent, grammatical, and informative. We also observe that, it is important to locate the main equations for NAME task. If the generation method emphasizes a subordinate equation, it will generate an unsatisfactory headline, such as the second
example in Table 5. Conclusions and Future Work Here we define and explore the novel NAME task of auto- matic headline generation for online math questions using a new deep model, MathSum. Two new datasets (EXEQ- 300k and OFEQ-10k) are constructed for algorithm training and testing and are made available. Our experimental results demonstrate that our model can often generate useful math headlines and significantly outperform a series of state-of- the-art models. Future work could focus on enriched repre- sentations of math equations for mathematical information retrieval and other math-related research. 13https://math.stackexchange.com/questions/ 14https://math.stackexchange.com/questions/ 15https://mathoverflow.net/questions/291434Examples Partial Math Detailed Question (EXEQ-300k)So I am asked to find the inverse elements of this set Z[i] =fa+ibja;b2Zg(I know that this is the set of Gaussian integers). I was pretty much do... Human-Writtenfinding the inverse elements of Z[i] =fa+ibja;b2Zg MathSumfinding the inverse elements of Z[i] =fa+ibja;b2Zg Partial Math Detailed Question (EXEQ-300k)Suppose that the function :R2!R is continuously differentiable. Define the function g:R2!Rby... Human-Written using the chain rule in Rn MathSum find@g @s(s;t) Partial Math Detailed Question (OFEQ-10k)In the paper of Herbert Clemens Curves on generic hypersurfaces the author shows that for a generic hypersurface V ofPnof sufficiently high degree there is no rational...
Proceedings of the 27th ACM International Conference on Information and Knowledge Management , 37–46. ACM. [Krstovski and Blei 2018] Krstovski, K., and Blei, D. M. 2018. Equation embeddings. arXiv preprint arXiv:1803.. [Le, Indurkhya, and Nakagawa 2019] Le, A. D.; Indurkhya, B.; and Nakagawa, M. 2019. Pattern generation strategies for improving recognition of handwritten mathematical ex- pressions. arXiv preprint arXiv:1901.. [Lin 2004] Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81. [Liu and Qin 2014] Liu, X., and Qin, J. 2014. An interac- tive metadata model for structural, descriptive, and referen- tial representation of scholarly output. Journal of the Asso- ciation for Information Science and Technology 65(5):964–
[Mihalcea and Tarau 2004] Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing , 404–411. [Narayan, Cohen, and Lapata 2018] Narayan, S.; Cohen, S. B.; and Lapata, M. 2018. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ACL. [Nishikawa et al. 2014] Nishikawa, H.; Arita, K.; Tanaka, K.; Hirao, T.; Makino, T.; and Matsuo, Y. 2014. Learn-ing to generate coherent summary with discriminative hid- den semi-markov model. In Proceedings of COLING 2014 , 1648–1659. [Papineni et al. 2002] Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evalua- tion of machine translation. In Proceedings of the 40th an- nual meeting on association for computational linguistics , 311–318. [Roy, Upadhyay, and Roth 2016] Roy, S.; Upadhyay, S.; and Roth, D. 2016. Equation parsing: Mapping sentences to grounded equations. EMNLP. [Schubotz et al. 2016] Schubotz, M.; Grigorev, A.; Leich, M.; Cohl, H. S.; Meuschke, N.; Gipp, B.; Youssef, A. S.; and Markl, V. 2016. Semantification of identifiers in mathemat-
ics for better math information retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval , 135–144. ACM. [See, Liu, and Manning 2017] See, A.; Liu, P. J.; and Man- ning, C. D. 2017. Get to the point: Summarization with pointer-generator networks. ACL. [Tan, Wan, and Xiao 2017a] Tan, J.; Wan, X.; and Xiao, J. 2017a. Abstractive document summarization with a graph- based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Lin- guistics , 1171–1181. [Tan, Wan, and Xiao 2017b] Tan, J.; Wan, X.; and Xiao, J. 2017b. From neural sentence summarization to headline generation: A coarse-to-fine approach. In IJCAI , 4109–
[Vaswani et al. 2017] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polo- sukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems , 5998–6008. [Wang et al. 2018] Wang, L.; Zhang, D.; Gao, L.; Song, J.; Guo, L.; and Shen, H. T. 2018. Mathdqn: Solving arithmetic word problems via deep reinforcement learning. In Thirty- Second AAAI Conference on Artificial Intelligence. [Wu et al. 2018] Wu, J.-W.; Yin, F.; Zhang, Y .-M.; Zhang, X.-Y .; and Liu, C.-L. 2018. Image-to-markup generation via paired adversarial learning. In Joint European Con- ference on Machine Learning and Knowledge Discovery in Databases , 18–34. Springer. [Yasunaga and Lafferty 2019] Yasunaga, M., and Lafferty, J.