Maryam Sadat MIRZAEI

Bio goes here






  • 04/2014 – 03/2017: Ph.D. in Informatics, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan (under supervision of professor Tatsuya Kawahara)
  • 04/2012 – 03/2014: M.Sc. in Informatics, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan (under supervision of professor Tatsuya Kawahara)
  • 10/2010 – 03/2012: Research Student with focus on Computer Assisted Language Learning, Department of Human and Environmental Studies, Kyoto, Japan (under supervision of professor Mark Peterson)
  • 10/2005 – 01/2009: B.A. in English Language-Translating tendency, Faculty of Mortal Literature and Science, Teacher Training University, Tehran, Iran


  • 06/2017 – present: Post-doc Researcher, RIKEN AIP, Tokyo, Japan (Guest Researcher in Kyoto University under Supervision of Professor Toyoaki Nishida)
  • 07/2006 – 09/2010: English Teacher, Iran Language Institute (ILI), Tehran, Iran

Research Interests

  • Human-AI Communication
  • Human-Computer Interaction
  • Computer-Assisted Language Learning
  • Second Language Acquisition (SLA)
  • Computational Linguistics
  • Human Language Technology
  • Computer-Mediated Communication (CMC)
  • Information and Communication Technology
  • Distance Learning
  • Educational Technology


  • Mirzaei, M. S., Meshgi, K., & Kawahara, T. (2017). Exploiting Automatic Speech Recognition Errors to Enhance Partial and Synchronized Caption for Facilitating Second Language Listening. Computer Speech and Language Journal, Elsevier (DOI).
  • Mirzaei, M. S., Meshgi, K., & Kawahara, T. (2017). Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill. ReCALL Journal, Cambridge University Press (DOI).
  • Mirzaei, M. S., Meshgi, K., & Kawahara, T. (2017). Listening difficulty detection to foster second language listening with the Partial and Synchronized Caption system. InCALL in a Climate of Change –Proceedings of the 2017 EUROCALL Conf., Southampton, U.K. (DOI).
  • Mirzaei, M. S., & Kawahara, T. (2017). Detecting listening difficulty for second language learners using Automatic Speech Recognition errors. In O. Engwall, J. Lopes, & I. Leite (Eds). Proceeding of Interspeech 2017 workshop on Speech and Language Technology in Education (SLaTE 2017), Stockholm, Sweden. (DOI)
  • Mirzaei, M. S., Meshgi, K., & Kawahara, T. (2016). Leveraging automatic speech recognition errors to detect challenging speech segments in TED talks. In CALL Communities and Culture –Proceedings of the 2016 EUROCALL Conf., Limmasol, Cyprus. (DOI).
  • Meshgi, K., Mirzaei, M.S., Oba, S., & and Ishii, S. (2017). Efficient Asymmetric Co-Tracking using Uncertainty Sampling, In Proceeding of IEEE ICSIPA’17, Kuching, Malaysia (Best paper award).
  • Meshgi, K., Mirzaei, M.S., Oba, S., & and Ishii, S. (2017). Active Collaborative Ensemble Tracking, In Proceeding of IEEE AVSS’17, Lecce, Italy.
  • Mirzaei, M. S., Meshgi, K., & Kawahara, T. (2016). Automatic Speech Recognition Errors as a Predictor of L2 Listening Difficulties. In Proceedings of Coling 2016, workshop on Computational Linguistics for Linguistic Complexity (CL4LC, 2016), Osaka, Japan. (DOI)
  • Mirzaei. M. S., Meshgi, K., Akita, Y., & Kawahara, T. (2015). Errors in automatic speech recognition versus difficulties in second language listening. In F. Helm, L. Bradley, M. Guarda, & S. Thouesny (Eds), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 410-415). Dublin: Research- (DOI).
  • Mirzaei, M. S., & Kawahara, T. (2015). ASR Technology to Empower Partial and Synchronized Caption for L2 Listening Development. In S. Steidl, A. Batliner, & O. Jokisch (Eds) Proceeding Interspeech 2015 workshop on Speech and Language Technology in Education (SLaTE 2015), Leipzig, Germany. (DOI)
  • Mirzaei, M.S., Akita, Y., & Kawahara, T. (2014). Partial and Synchronized Captioning: A New Tool for Second Language Listening Development. In S. Jager, L. Bradley, E. J. Meima, & S. Thouësny (Eds), In CALL Design: Principles and Practice; Proceedings of the 2014 EUROCALL Conf., Groningen, The Netherlands (pp. 230-236). Dublin: (DOI)
  • Mirzaei, M. S., Akita, Y., & Kawahara, T. (2014). Partial and Synchronized Caption Generation to Develop Second Language Listening Skill. In Workshop Proceedings of the 22nd International Conference on Computers in Education ICCE, Nara, Japan (pp. 13-23). (DOI)


  1. * These works are cited in Intralingual Subtitles (Captions) and Foreign/Second Language Acquisition Research Bibliography by Professor Gunter Burgers.






Recently working on “Partial and Synchronized Caption” project in order to advance the system and incorporate it into a CALL system. This on-going study investigates the use of a novel method of captioning, PSC, as a listening tool for the second language (L2) learners. In this method, the term partial and synchronized caption pertains to the presence of only a selected set of words in a caption where words are synced to their corresponding speech signals. This new approach relies on the latest advances in speech recognition technology; where an automatic speech recognition system (Julius 4.3) was trained using the desired corpora to precisely align each word to its corresponding speech signal. This word-level alignment emulates the speech flow and allows for text-to-speech mapping. The outcome of this process is used to generate partial captions by automatically selecting words and phrases which are likely to hinder learner’s listening comprehension. The selected words are presented in caption while the rest are masked by dots in order to make comprehension based more on listening to the speech rather than solely on reading the caption text. The criteria for selection are defined by three features: “speech rate”, “word frequency” and “specificity”. This method is based on the premise that occurrence of infrequent words in listening materials and fast delivery of the speech by the speaker attenuate L2 listening comprehension. In this view, PSC strives to assist the learners by presenting these problematic words/phrases in the caption. For effective word selection, learner’s vocabulary size and tolerable rate of speech were adopted as the basis for generating the captions. Finally, PSCs are automatically prepared for learners at three proficiency levels: “beginners”, “pre-intermediates” and “intermediates”. This type of captioning is anticipated to be not only an assistive tool to enhance L2 learners’ listening comprehension skills but also a medium to decrease dependence on captions thus preparing learners for real-world situations. Click here (the official project page) to see demonstration videos on how PSC works.

Contact Information

Email: mirzaei [at] ii [dot] ist [dot] i [dot] kyoto-u [dot] ac [dot] jp
Address: Room 217, Engineering Building No.7, Kyoto University,
Yoshida-honmachi, Sakyo-ku, Kyoto, Japan
Phone: +81 80-3816-1366