Comparison of feature-based sentence ranking methods for extractive summarization of turkish news texts

Ertürk Erdağı; Volkan Tunali

Araştırma Makalesi

Yıl 2024, Cilt: 42 Sayı: 2, 321 - 334, 30.04.2024

Ertürk Erdağı Volkan Tunali

Öz

Kaynakça

REFERENCES
[1] Ong WH,. Tay KG, Chew CC, Huong A. A comparative study of extractive summary algorithms using natural language processing. In: 2020 IEEE Student Conference on Research and Development (SCOReD);2020 Sep; Batu Pahat, Malaysia. 2020. pp. 406–10.
[2] Asa AS, Akter S, Uddin P, Hossain D, Roy SK, Afjal MI. a comprehensive survey on extractive text summarization techniques. Am J Eng Res 2017;6:226–39.
[3] Chen KY, Liu J, Chen B, Wang HM, Jan EE, Hsu WL, et al. Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Trans Audio Speech Lang 2015;23:1322–34.
[4] Güran A, Bayazit NG, Gürbüz MZ. Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. 2013;21:1411–25.
[5] Du Y, Huo H. news text summarization based on multi-feature and fuzzy logic. IEEE Access 2020;8:140261–72.
[6] Yeh JY, Ke HR, Yang WP, Meng IH. Text summarization using a trainable summarizer and latent semantic analysis Inform Process Manag 2005;41:75–95.
[7] Fattah MA, Ren F. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 2009;23:126–44.
[8] Ouyang Y, Li W, Li S, Lu Q. Applying regression models to query-focused multi-document summarization. Inf Process Manag 2011;47:227–237.
[9] Baralis E, Cagliero L, Jabeen S, Fiori A. Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC ’12; 2012; Trento, Italy. 2012. pp. 782.
[10] Page L, Brin S, Motwani R, Winograd T. The pagerank citation ranking: Bringing order to the web. Available at: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxSHgkqCDCgmQbPsctLcZlJFlf? projector=1&messagePartId=0.4 Accessed on 13 Mar, 2024.
[11] Alguliev RM, Aliguliyev RM, Isazade NR. Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 2013;40:1675–89.
[12] Ferreira R, Cabral L, Lins RD, Silva G, Cavalcanti GDC, Lima R, et al. Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 2013;40:5755–64.
[13] Mihalcea R, Tarau P. TextRank: Bringing Order into Text. In Lin D, Wu D, editors. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; 2004 Jul; Bercelona, Spain: Association for Computational Linguistics;2004. pp. 404¬–11.
[14] Ahmad T, Ahmed S, Ahmad N, Aziz A, Mukul L. News article summarization: Analysis and experiments on basic extractive algorithms. Int J Grid Distrib Comput 2021;13:2366–79.
[15] Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M. Single Document Summarization based on Nested Tree Structure. In Proceedings of the 52nd Annual Meeting; 2014 Jun; Baltimore, Maryland: the Association for Computational Linguistics; 2014. pp 315–20.
[16] Liu H, Yu H, Deng ZH. Multi-document summarization based on two-level sparse representation model. Knowl Inf Syst 2017;53:297–336.
[17] Fang C, Mu D, Deng Z, Wu Z. Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 2016;72:189–95.
[18] Jia R, Zhang X, Cao Y, Wang S, Lin Z, Wei F. Neural label search for zero-shot multi-lingual extractive summarization. Available at: https://aclanthology.org/2022.acl-long.42.pdf Accessed on 13 Mar, 2024.
[19] Veri setleri – Büyük veri ve yapay zeka laboratuvarı. Available at: http://buyukveri.firat.edu.tr/veri-setleri/ Accessed on 13 Mar, 2024. (Article in Turkish)
[20] Natural language toolkit – NLTK 3.5 documentation. Available at: https://www.nltk.org/ Accessed on 13 Mar, 2024.
[21] Python Türkçe doğal dil işleme - Turkısh nlp, Türkçe için doğal dil işleme araçları. Available at: https://pypi.org/project/trnlp/ Accessed on 14 Mar, 2024. (Article in Turkish)
[22] Altan Z. A Turkish automatic text summarization system. In Proceedings of The Iasted International Conference on Artifical Intelligence and Applications; 2004 Feb 16-18; Innsbruck, Austira: 2004. pp 311–6.
[23] Hati̇poğlu A, Omurca Sİ. A hybrid modelling for Turkish text summarization. Dokuz Eylül Univ Eng Fac J Sci Eng 2015;17:95–108.
[24] Lin CY, ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out; 2004 Jul; Barcelona, Spain: Association for Computational Linguistics; 2004. pp. 74.81.
[25] Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002 Jul 7-12; Philadelphia, Pennsylvania, USA. 2002. pp. 311–8.
[26] Azam N, Yao J. Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 2012;39:4760–8.
[27] Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev 1958;2:159–65.
[28] Akhtar N, Javed H, Ahmad T. Hierarchical summarization of text documents using topic modeling and formal concept analysis. In: Balas V, Sharma N, Chakrabarti A, editors. Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing; 2018; Singapore:Sprinter;2018. pp. 21–33.
[29] Turan M, Pek M. An extensible model for Turkish single document summarization. J CIEES 2022;2:16-20.
[30] Kutlu M, Cigir C, Cicekli I. Generic text summarization for Turkish. J Comput 2010;58:1315–23.
[31] Kartal YS, Kutlu M. Machine learning based text summarization for Turkish news. In 2020 28th Signal Processing and Communications Applications Conference (SIU);2020 Oct; Gaziantep, Turkey: 2020. pp. 1–4.
[32] Akhtar N, Zubair N, Kumar A, Ahmad T. Aspect based sentiment oriented summarization of hotel reviews. Procedia Comput Sci 2017;115:563–71.
[33] Tepeli Y, Arıcı AF. A study on vowel-consonant harmony violating words in Turkey Turkish. Int J Soc Educ Sci 2018;5:249–62.
[34] Chakraborty R, Bhavsar M, Dandapat SK, Chandra J. Tweet summarization of news articles: An objective ordering-based perspective. IEEE Trans Comput Soc Syst 2019;6:761–77..
[35] S. Schweter. BERTurk-bert models for Turkish. Available at: https://zenodo.org/records/3770924 Accessed on 13 Mar, 2024.
[36] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT 2019; 2019 Jun 2-7; Minneapolis, Minnesota, USA: 2019. 4171–86.

Comparison of feature-based sentence ranking methods for extractive summarization of turkish news texts

Yıl 2024, Cilt: 42 Sayı: 2, 321 - 334, 30.04.2024

Ertürk Erdağı Volkan Tunali

Öz

Document summarization is the task of generating a shorter form of document with import-ant information content. Automatic text summarization has been developed for this process and is still widely used. It is divided into two main parts as extractive summarization and abstractive summarization. In this study, we used sentence ranking methods for extractive summarization for Turkish news text within the scope of the experimental study. We used different summarization rates, 20%, 30%, 40%, 50% and 60%. Summarization results were evaluated with the ROUGE ve BLEU metrics. We proposed new methods based on major vowel harmony and minor vowel harmony features. We obtained high evaluation results in both ROUGE ve BLEU metrics with major vowel harmony and minor vowel harmony fea-tures. Additionally, we studied a hybrid model using major vowel harmony and minor vowel harmony rules together. We obtained the best results with major vowel harmony, minor vowel harmony, and hybrid model (major vowel harmony and minor vowel harmony together). We compared the three proposed methods with the BERTurk model prepared for Turkish based on Google BERT. The results obtained gave very close results to this state-of-the-art method and showed that it is worth developing.

Anahtar Kelimeler

Summarization, Extractive, Sentence Ranking, Major Vowel, Minor Vowel

Kaynakça

REFERENCES
[1] Ong WH,. Tay KG, Chew CC, Huong A. A comparative study of extractive summary algorithms using natural language processing. In: 2020 IEEE Student Conference on Research and Development (SCOReD);2020 Sep; Batu Pahat, Malaysia. 2020. pp. 406–10.
[2] Asa AS, Akter S, Uddin P, Hossain D, Roy SK, Afjal MI. a comprehensive survey on extractive text summarization techniques. Am J Eng Res 2017;6:226–39.
[3] Chen KY, Liu J, Chen B, Wang HM, Jan EE, Hsu WL, et al. Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Trans Audio Speech Lang 2015;23:1322–34.
[4] Güran A, Bayazit NG, Gürbüz MZ. Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. 2013;21:1411–25.
[5] Du Y, Huo H. news text summarization based on multi-feature and fuzzy logic. IEEE Access 2020;8:140261–72.
[6] Yeh JY, Ke HR, Yang WP, Meng IH. Text summarization using a trainable summarizer and latent semantic analysis Inform Process Manag 2005;41:75–95.
[7] Fattah MA, Ren F. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 2009;23:126–44.
[8] Ouyang Y, Li W, Li S, Lu Q. Applying regression models to query-focused multi-document summarization. Inf Process Manag 2011;47:227–237.
[9] Baralis E, Cagliero L, Jabeen S, Fiori A. Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC ’12; 2012; Trento, Italy. 2012. pp. 782.
[10] Page L, Brin S, Motwani R, Winograd T. The pagerank citation ranking: Bringing order to the web. Available at: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxSHgkqCDCgmQbPsctLcZlJFlf? projector=1&messagePartId=0.4 Accessed on 13 Mar, 2024.
[11] Alguliev RM, Aliguliyev RM, Isazade NR. Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 2013;40:1675–89.
[12] Ferreira R, Cabral L, Lins RD, Silva G, Cavalcanti GDC, Lima R, et al. Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 2013;40:5755–64.
[13] Mihalcea R, Tarau P. TextRank: Bringing Order into Text. In Lin D, Wu D, editors. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; 2004 Jul; Bercelona, Spain: Association for Computational Linguistics;2004. pp. 404¬–11.
[14] Ahmad T, Ahmed S, Ahmad N, Aziz A, Mukul L. News article summarization: Analysis and experiments on basic extractive algorithms. Int J Grid Distrib Comput 2021;13:2366–79.
[15] Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M. Single Document Summarization based on Nested Tree Structure. In Proceedings of the 52nd Annual Meeting; 2014 Jun; Baltimore, Maryland: the Association for Computational Linguistics; 2014. pp 315–20.
[16] Liu H, Yu H, Deng ZH. Multi-document summarization based on two-level sparse representation model. Knowl Inf Syst 2017;53:297–336.
[17] Fang C, Mu D, Deng Z, Wu Z. Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 2016;72:189–95.
[18] Jia R, Zhang X, Cao Y, Wang S, Lin Z, Wei F. Neural label search for zero-shot multi-lingual extractive summarization. Available at: https://aclanthology.org/2022.acl-long.42.pdf Accessed on 13 Mar, 2024.
[19] Veri setleri – Büyük veri ve yapay zeka laboratuvarı. Available at: http://buyukveri.firat.edu.tr/veri-setleri/ Accessed on 13 Mar, 2024. (Article in Turkish)
[20] Natural language toolkit – NLTK 3.5 documentation. Available at: https://www.nltk.org/ Accessed on 13 Mar, 2024.
[21] Python Türkçe doğal dil işleme - Turkısh nlp, Türkçe için doğal dil işleme araçları. Available at: https://pypi.org/project/trnlp/ Accessed on 14 Mar, 2024. (Article in Turkish)
[22] Altan Z. A Turkish automatic text summarization system. In Proceedings of The Iasted International Conference on Artifical Intelligence and Applications; 2004 Feb 16-18; Innsbruck, Austira: 2004. pp 311–6.
[23] Hati̇poğlu A, Omurca Sİ. A hybrid modelling for Turkish text summarization. Dokuz Eylül Univ Eng Fac J Sci Eng 2015;17:95–108.
[24] Lin CY, ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out; 2004 Jul; Barcelona, Spain: Association for Computational Linguistics; 2004. pp. 74.81.
[25] Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002 Jul 7-12; Philadelphia, Pennsylvania, USA. 2002. pp. 311–8.
[26] Azam N, Yao J. Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 2012;39:4760–8.
[27] Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev 1958;2:159–65.
[28] Akhtar N, Javed H, Ahmad T. Hierarchical summarization of text documents using topic modeling and formal concept analysis. In: Balas V, Sharma N, Chakrabarti A, editors. Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing; 2018; Singapore:Sprinter;2018. pp. 21–33.
[29] Turan M, Pek M. An extensible model for Turkish single document summarization. J CIEES 2022;2:16-20.
[30] Kutlu M, Cigir C, Cicekli I. Generic text summarization for Turkish. J Comput 2010;58:1315–23.
[31] Kartal YS, Kutlu M. Machine learning based text summarization for Turkish news. In 2020 28th Signal Processing and Communications Applications Conference (SIU);2020 Oct; Gaziantep, Turkey: 2020. pp. 1–4.
[32] Akhtar N, Zubair N, Kumar A, Ahmad T. Aspect based sentiment oriented summarization of hotel reviews. Procedia Comput Sci 2017;115:563–71.
[33] Tepeli Y, Arıcı AF. A study on vowel-consonant harmony violating words in Turkey Turkish. Int J Soc Educ Sci 2018;5:249–62.
[34] Chakraborty R, Bhavsar M, Dandapat SK, Chandra J. Tweet summarization of news articles: An objective ordering-based perspective. IEEE Trans Comput Soc Syst 2019;6:761–77..
[35] S. Schweter. BERTurk-bert models for Turkish. Available at: https://zenodo.org/records/3770924 Accessed on 13 Mar, 2024.
[36] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT 2019; 2019 Jun 2-7; Minneapolis, Minnesota, USA: 2019. 4171–86.

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yapısal Biyoloji
Bölüm	Research Articles
Yazarlar	Ertürk Erdağı 0000-0001-8619-8879 Volkan Tunali 0000-0002-2735-7996
Yayımlanma Tarihi	30 Nisan 2024
Gönderilme Tarihi	27 Nisan 2022
Yayımlandığı Sayı	Yıl 2024 Cilt: 42 Sayı: 2

Kaynak Göster

Vancouver	Erdağı E, Tunali V. Comparison of feature-based sentence ranking methods for extractive summarization of turkish news texts. SIGMA. 2024;42(2):321-34.

Makale Dosyaları

Tam Metin

IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/