On Gamifying the Transcription of Digital Video Lectures

Games can be used to exploit the computational power of humans to perform tasks that are diﬃcult for computers. One of these diﬃcult tasks is the transcription of video lectures. Indeed, the characteristics of the speech that occur in video lectures are not well suited for speech recognition technologies. In this paper we propose ALGA, an ALtruistic GAme, designed to involve students in the production of transcripts. Players challenge each other by listening to short, and randomly selected, pieces of the audio stream, and by submitting the corresponding transcription. When two players (unknown to each other) submit the same version, the transcript of the audio chunk is considered correct and the players gain points. To motivate players, ALGA provides the ﬁnal transcript to all the players and maintains a high-score list for every video lecture. The evaluation shows that the accuracy of the obtained transcripts is higher than the one obtained by speech recognition technologies and also shows that participants like the game approach. Hence, ALGA can be considered a reasonable, feasible and aﬀordable solution to produce transcripts from video lectures.


Introduction
Many private and public educational institutes use video lectures to improve the effectiveness of teaching in and out of classrooms and to support using video games as ideal companions to classroom instruction is unquestionable [13,14].Furthermore, the use of elements of game to motivate learning is seen as "a serious approach to accelerate the curve of the learning experience, teach complex subjects, and systems of thought" [15].For these reasons, computer games are more and more used in various learning scenarios, e.g.classroom education, government, financial services, health-care, science, telecommunications, corporate, military training, etc. [16].
In this paper, the game we propose to involve students in the transcription process falls in the area of Game With A Purpose (GWAP) [17].GWAP games are used to exploit the computational power of humans to perform tasks that are easy for humans, but difficult for computers.An example of GWAP game is ESP [18], a game designed to label pictures: as known, it is very difficult for computers to understand the semantics of an image, but this task is quite easy for humans.Therefore, ESP gamified this process by displaying the same picture to two players (unknown to each other) and by asking them to label the picture.If players submitted the same label for the picture, the label was considered appropriate.
Our proposal aims at gamifying the transcription process of video lectures with an approach that does not require the use of any speech recognition technologies, of professional copy editing and of monetary incentives to motivate players.All ALGA needs is the audio stream of the video lecture and the students playing.
Briefly speaking, ALGA automatically splits the audio stream of the video lecture into several short audio chunks (length in the range 5-20 seconds) and randomly presents one of these audio chunks to a player, asking him/her to submit a transcription of it.If the entered text matches a previous submitted text, the player (and the one who submitted the text that matches) gains points.After playing with a chunk, the player can play with another one and he/she can play as many chunks as he/she wants.A high-score list is maintained for each video lecture and when a player is passed in the ranking, he/she is informed (e.g., through e-mail) in order to encourage him/her to play again.The transcript is made available to all the players when the percentage of audio chunks correctly transcribed is above a threshold (defined by the game administrator, usually the video lecture speaker).
In addition to transcription, ALGA allows to play with links to external resources: players can suggest resources that contain materials to deepen the study of the topic(s) covered in the video lecture.Also in this case, if the entered link matches a previous submitted link, the player (and the one who submitted the link that matches) gains points.
The evaluation of the proposed game shows that the game approach is appreciated by students and also shows that the accuracy of the produced transcript is higher than the one obtained through ASR technologies.Therefore, ALGA can be considered a reasonable, feasible and affordable solution to produce transcripts from video lectures.
The remainder of the paper is organized as follows: Section 2 presents related work in the area of transcript correction; Section 3 shows details of the ALGA proposal, whereas Section 4 shows the ALGA evaluation study.Conclusions are drawn in Section 5.

Related Work
The gamification of the transcription process of digital video lectures involves different fields: the use of games in education, the gamification approach, the transcription of digital lectures and the game with a purpose scenario.In the following, we overview studies in these different fields.

Serious Games in Education
In the past years, different studies showed that well-designed computer educational games provide undeniable benefits and might be well suited for active learning as they provide a learning environment able to foster the higher order knowledge and the skills of students [13,14,15,16].Indeed, most computer games are active, experiential and interactive and these features are those that most influence effective learning [13].Therefore, it is not surprising that games are being employed more and more into learning environments (e.g.classroom education, financial services, healthcare, military training, etc.) [16,19].For instance, Corti [20] analyzed the use of serious games in the learning environment and found that the benefits of using video games is unquestionable.Hwang et al. [21] showed that games are perceived as a means to engage players in enjoyable activities to accomplish some challenging objectives.Wrzesien and Alcaiz Raya [22] indicated three main reasons for the ever-increasing use of serious games in education: (a) they use actions rather than explanations and create personal motivation and satisfaction, (b) they accommodate multiple learning styles and abilities, and (c) they foster decision-making and problem-solving activities in a virtual setting.Gentile and Gentile [23] observed that the use of computer games in the learning environment brings happiness and sense of achievement to learners, thus helping them to improve their learning results and stimulate them to think; these advantages suggest that computer games could be applied to improve traditional methods of teaching.Ebner and Holzinger [24] found that students who use games in the learning environment produce learning results that surpass other methods of traditional teaching.Finally, we observe that researchers have developed computer games to improve learning for diverse disciplines, such as mathematics, computer science and linguistic (e.g., [25,26,27], just to name a few).
However, it is worth mentioning that the use of serious games in the educational scenario is also subject to a critical thinking.Guillen et al. [16] highlithed that there is still little consensus on the process by which games engage learners and on the types of learning outcomes that can be achieved through game play.Connelly et al. [13] observed that it is difficult to understand the effects of games.Lin et al. [19] observed the difficulties in developing effective games for the learning environment.
Far from settle the debate, we observe that a recent trend in education is the usage of the so-called "gamification" approach.

The Gamification approach
Since the initial proposal of using computer games in the educational scenario, the use of games for learning purposes has evolved and recently the "gamification" process has gained significant attention [28].This process refers to "the use of game design elements in non-game contexts" and is applied not only to education, but also to many other scenarios like healthcare or production.
With respect to the educational scenario, the gamification approach is increasingly used for teaching students, training people, engaging users and balancing difficulties and abilities [29,30,31].In particular, Barata et al [32] explored how gamification can be applied to education in order to improve student engagement: they gamified a college course and compared it against a non-gamified one.Results showed that students who attended the gamified course had significant improvements in terms of attention to reference materials, online participation and proactivity.Moreno [33] investigated the gamification of programming and showed that students who used a video game to improve their programming skills, performed 12% better in their final exam than students who did not.Iosup and Eperma [15] experimented a teaching technique that used social gaming elements to deliver higher education: they found that that gamification is correlated with an increase in the percentage of passing students, and in the participation in voluntary activities and challenging assignments.Furthermore, they found that gamification also fostered interaction in the classroom and triggered students to pay more attention to the design of the course.Sepehr and Head [34] examined the effect of gamification techniques in engaging students in a teaching context; in particular they analyzed the influence of competition and results showed that competition is a key element that highly motivates students to engage in the gamification tasks.Gene' et al. [28] analyzed the gamification in Massive Online Open Courses and proposed a model to motivate MOOC's students based on game elements: they proposed to rank students, to show course progress, to provide a course certification and to introduce the "like" features in the MOOC courses.O'Donovan et al. [35] discussed positive and negative aspects of gamifying a university course and they showed evidences that gamification improves student engagement and understanding.

Transcription of Digital Lectures
The Web is full of video lectures, tutorial, reviews, reports, interviews that are not accessible to a large part of the population (e.g., hearing impaired people, dyslexic students or non-native speaker) to machine (e.g., search engines) and to anyone who needs to search, review or translate what is said in the video [36].With no doubt, the availability of transcripts would provide several benefits (e.g., improve comprehension, reduce deficiencies, improve indexing, etc.) and currently there are two main approaches to produce transcripts: manual or automatic.The former approach involves professional people who usually guarantee accuracies as high as 99%, for fees as low as $1 per minute of transcribed text [8], whereas the latter approach requires the usage of ASR software that may achieve very high accuracy (up to 99%) in a ideal scenario (e.g., good acoustic environment and microphone quality), but when used in the classroom scenario the accuracy may drop a lot (less than 70%) and therefore manual correction is required [6].
Knight and Almerot [37] proposed AutoCap, a software designed to produce timecoded transcripts through ASR technologies, but the approach requires experts to correct the transcript produced by the ASR.Wald [7,36] described a tool that facilitates crowdsourcing correction of speech recognition captioning errors.Starting from a time-coded transcript, the tool allows users (one at a time) to correct possible errors in the transcript and the system stores the modified versions of the transcript.Luz et al. [38] proposed a 3D "transcription game" that displays sentence transcription candidates through animated 3D representations of word lattices generated by speech recognition.The user can interact with these sentence representations by selecting the correct paths as the words move towards the background.Novotney and Callison-Burch [39] suggested to partition audio files into 5-second segments and to post tasks on the Amazon Turk platform, where Turk workers transcribe the audio segment for a small fee.The best transcripts were 13% lower in word accuracy.Liem et al. [8] used a similar approach to improve existing transcripts, but instead of using the Amazon Turk Platform, they involved students.They proposed to partition the audio track into 10-second segments and to process these segments independently.The mechanism uses ASR software to generate the transcript and asks students to correct it.
Recently, different proposals introduced the game approach in the transcription process, making these proposals to fall into the Game With A Purpose category.

Games With A Purpose
Games With A Purpose (GWAP) are games that exploit the computational power of humans to perform tasks that are easy (and somehow entertaining) for humans, but difficult for computers.There are many examples of these tasks, from labeling pictures to decoding the code for genetic diseases, from understanding the perceived colors of a picture to unfolding RNA molecules.
The first known example of GWAP game is ESP, a game designed to label images [18].This task is very difficult for computers, but quite easy for humans.Therefore, the idea of the game is to make entertaining the labeling process, by displaying the same picture to two players, randomly picked and unknown to each other, and by asking them to reach an agreement on a label for the picture.Another example of difficult task for computers is the ranking of images and Lux et al. [40] proposed to introduce game elements in this task.
With respect to the transcription process, Kacorri et al. [9] proposed a caption editing system based on crowdsourced work.The audio is first transcribed with the IBM Attilia speech recognition engine and then is split into 2-10 seconds segments.Users have to identify the correct captions for each video segment.To make the task more entertaining, users watch the video segment and has to enter the correct caption playing against a countdown timer.To obtain the final caption, the proposal aligns and merges all the captions submitted by users.Preliminary results with 42 participants and 578 short video segments showed that the accuracy increased of 4% with respect to the one achieved by the sole ASR software.

The ALGA Proposal
To produce transcripts from video lectures without the use of speech recognition technologies, of professional copy editing and of monetary incentives to motivate non-professional workers, we consider the GWAP approach and we propose the gamification of the transcription process of video lectures.In particular, we propose ALGA, an ALtruistic GAme designed to involve students in the transcription of video lectures.
In ALGA, students can challenge each other with two different play modes: textual and link.In the textual mode the goal is to produce the video lecture transcript, whereas in the link mode the goal is to associate to the video lecture external resources that may help students to deepen the study of the topic(s) covered in the lesson.In particular, in the textual mode, the player has to listen to a piece of the audio lesson and has to submit the corresponding transcription.The length of the audio chunk is determined by an audio segmentation process that splits the audio according to the audio characteristics (note that, from an experimental investigation, see Section 4, the length of the audio chunk varies between 5 and 20 seconds).The splitting is done to ease the transcription process, as players only need to listen and transcribe a short piece of audio.Indeed, since a person speaks 150-170 words per minute, the transcription of an audio chunk of 5-20 seconds requires the player to write around 13 -50 words.When the entered text matches a previous submitted text, the player (and the one who submitted the text that matches) gains points.After playing with an audio chunk, the player can play with another one and can play as many chunks as he/she wants.In the link mode, the player has to submit a URL to an external resource that he/she thinks might be useful to explore the topic covered in the lesson.When the entered URL matches a previous submitted URL, the player (and the one who submitted the URL that matches) gains points.
Since a game needs to be interesting and engaging [41], ALGA motivates students with two different rewarding schemes: i) the entire transcript of the video lecture and ii) a high-score list for every video lecture.If the transcript of a lecture can be certainly useful to students, the use of high-score lists is due to the fact that these are felt like a reward [21,42,43].For this reason, when a player reaches the first position, he/she is nominated the current "Master of the video lecture' and, at the same time, the previous "Master" is informed through e-mail that he/she is no longer at the top of the high-score list.This works as a stimulus to play again.
ALGA also provides the figure of a director for each course (e.g., a group of video lectures), a special user (e.g., the one who speaks in the video lecture, or a manager appointed by the content provider) that may transcribe or may provide links without the need to be confirmed.Also, the director has to specify the threshold (e.g., 90%) beyond which it is possible to release the transcript to the players.
In the following, we present the details of the game rules, the way the game engine operates and an example of the game.An ALGA prototype will be presented in the next section.

The Rules of The Game
Object of the game Become "Master of the video lecture" by collecting more points than other players.Points are gained when a correct transcription or a useful link is entered.To be considered correct, the transcript must be submitted by two different players (or submitted by a player and confirmed by the director).The same applies to the link to be considered useful.

Game Reward
The video lecture transcript.

Game Setup
After login (only registered users may play), a player has to select the course topic (e.g., Linguistics, Communication Technology, Computer Networks), the specific video lecture he/she wants to play with and the play modality.The course topic and the video lecture can be selected among the ones available, whereas the play modality may be either Textual or Link.
1. Textual: a player has to listen to an audio chunk (randomly selected by the system) and has to write the textual transcription.If the submitted transcription matches a transcription submitted by another player, both players gains one point.If the transcription does not match any other transcriptions, but it is confirmed by the director, the player gains one point.2. Link: a player has to suggest a link to a Web resource for the entire video lecture he/she is playing with.The player gains ten points if the submitted link matches a link submitted by another player or if the director confirms the link is useful for the topic(s) covered in the video lecture.

The Game Engine
ALGA considers a video lecture as composed of three components: the audio stream (A), the audio transcript (T) and the links (L).In particular, the audio file is logically seen as composed of a sequence of N audio chunks: A = {a 1 , a 2 , ..., a i , ..., a N }; the transcript T is considered as composed of a sequence of N pieces: T = {t 1 , t 2 , ..., t i , ..., t N }, and the link L contains all the external resources associated with the video lecture L = {l 1 , l 2 , ..., l i , ..., l K }.Initially, T and L are empty and will be filled by the players during the play.
Each t i contains the following information: • s time i : the initial audio point of the audio chunk a i with respect to the entire audio stream; • e time i : the final audio point of the audio chunk a i with respect to the entire audio stream; • text i : the textual transcription of what is said in chunk a i ; • state i : the current state of the chunk t i ; • player j : is a t − uple containing the ID of the players who entered text i (i.e., P layer j = (p 1 , p 2 )) Each link l i contains the following information: • url i : the url suggested for the video lecture; • state i : the current state of the link l i ; • player j : is a t − uple containing the ID of the players who entered url i (i.e., P layer j = (p 1 , p 2 )) The state of t i can be one of the following and the transitions between states are depicted in Figure 1: Stand-by: the transcript of the audio chunk finds no confirmation and therefore, for the game point of view, the actual transcript of chunk a i is still to be determined; Checked: two players submitted the same transcript of the audio chunk, or the transcript submitted by a player was confirmed by the director, or the director directly wrote the audio chunk transcription.In all these cases, for the game point of view, text contains what is actually said in the audio chunk a i ; The state of l i can be one of the following: L-Stand-by: the link finds no confirmation and therefore, for the game point of view, the link contains material not yet verified either by other players or by the director.
Enriched: two players submitted the same link, or the link submitted by a player was confirmed by the director, or the director directly specified a link for the video lecture.In all these cases, the link is included into the Link i t − uple.
Depending on the game modality, the game engine operates as follows: 1. Textual Mode: the game server randomly selects t i among the ones that are not in the checked state and presents the corresponding audio chunk a i to the player in its aural form.The player has to listen to the audio and has to submit the transcript.The submitted version is compared against all the versions in the stand-by status.If a match is found, the players that provided the same version gain one point, the status of the entry is changed from stand-by to checked and all the other versions of t i with stand-by status (if any) are removed.If there is no match, the state of t i is labeled as stand-by and the player does not gain points.Note that, the director may change from stand-by to checked the status of any entry.
2. Link: the player may suggest a link to an external web resource that he/she thinks is worth reading to deepen the topic(s) covered in the entire video lecture.Every submitted link is compared against all the links in the L-stand-by status.If a match is found, the players that provided the same version gain ten points, the state of the entry is changed from L-stand-by to enriched.If there is no match, the state of the suggested link is labeled as L-stand-by and the player does not gain points.In addition, the director may change the status from from Lstand-by to enriched of every suggested link that he/she thinks is worth reading to deepen the topic(s) covered in the entire video lecture.Also in this case, the player who suggested the link gains ten points.

A Game Example
Let us suppose that the selected original audio chunk a i contains: "The Web is an Internet application that links millions of web pages" and suppose that the game is played by players P a , P b and P c .
Scenario 1: P a plays in the Textual mode and writes "The web is an Internet application that links millions of web pages".P b plays in the Textual mode and writes "The web is an application that links billions of web pages".
If there is no match with the entries labeled as stand-by, the following entries are stored in the DB: • (..., "The web is an Internet application that links millions of web pages", stand-by, ..., (P a )) • (..., "The web is an application that links billions of web pages", standby, ..., (P b )) In this scenario, no points are given to the player.
Scenario 2: P c plays in the Textual mode and the DB stores the entries of the Scenario 1. P c writes "The web is an Internet application that links millions of web pages".Since the P c version matches the one of P a , the version is considered correct.As a consequence, the state of the entry is changed from stand-by to checked, whereas all the other versions labeled as stand-by (if any) are removed: • (..., "The web is an Internet application that links millions of web pages", checked, ,..., (P a , P c )) In this scenario, one point is given to players P a and P c .

Experimental Assessment
To evaluate ALGA, we developed a prototype version of the game and we set-up an experimental scenario composed of twelve different 45 minutes video lectures taken from three different courses (Linguistic, Videocommunication Lab and Communication Technology).We involved 59 students and we asked them to participate to the experimental scenario.

The ALGA Prototype
The prototype is based on a web architecture, see Figure 2, so that the game can be enjoyed by players using any type of device.The back-end of the game is composed of a game engine that accesses to the audio stream of the video lecture, splits it into several chunks and implements the rules of the game.A MySQL DB is used to keep track of the players actions.
One of the most important component of back-end is the one that splits the audio of the video lecture into several chunks.To this aim, we developed an audio splitting mechanism based on the audio energy investigation as this technique is easy to implement and has been widely used in literature (e.g.[44,45,46,47].In particular, for the sake of simplicity, we considered the audio stream as a sequence of N virtual frames, f 1 f 2 ...f N , where the time length of each virtual frame is fixed and equal to 30 milliseconds (a common time length) and then we computed the sound energy of each virtual frame with the formula: where, N is the number of audio samples within the frame and pcm i is the i − th audio sample of the considered frame1 .The goal of our mechanism is to identify locations where to split the audio stream.These locations must correspond to silence periods.The audio energy investigation computes all the frames that can be considered silence.An important parameter that may affect the identification of the locations where to split the audio stream is the length of the silence.Indeed, if we consider very short silence, the mechanism will likely identify as "silence periods" the short silence located between syllables of the same word and this will likely cause our mechanism to split the audio in the middle of a word.For this reason, in our experimental scenario we consider different values of the silence length: from 30 milliseconds (shorter silences are not worth investigating as they will likely correspond to silence between syllables of the same word) [48] to 90 milliseconds.Another important parameter required by our mechanism is the length of the audio chunk that players will need to transcribe.To this aim, we performed an experimental assessment asking participants to listen and transcribe chunks of different lengths (5, 10, 15 and 20 seconds).After that, we asked them the preferred length: results showed that the majority of them (87%) agree on the 5 seconds length.
We applied the developed audio splitting mechanism to the video lectures by varying the silence length and the characteristics of the obtained chunks are reported in Table 1.These characteristics show that the shorter the silence length is, the shorter the average audio chunk length is.The reason is that short silences are more frequent than longer silence and therefore it is possible to split the audio more frequently (in this case close to the 5 seconds length suggested by participants).Note that the average number of words per chunk is statistically computed by considering that a speech has around 150-170 words per minute.
With respect to the front-end, ALGA requires user to login and to select the course he/she wants to play with and the play mode (i.e.Textual or Link).This is done by selecting the course degree (left column of Figure 3) and the course name (central part of Figure 3).After that, a pop-up window asks user the play modality.Once the course and the play mode are selected, the player has to select the video lecture he/she wants to play with (left part of Figure 4).Once selected, the game is ready to play.Figure 4 shows an example of the textual mode: the player has to listen to the audio chunk and has to write the transcription in the box.After that, the player is notified if he/she gained the point, he/she is informed about the current position in the high-score list and is asked if he/she wants to play again.Figure 6 shows an example of the personal profile page: here the player can keep track of his/her position in the various video lectures.For instance, in the ranking page he/she can observe the current ranking in all the video lectures he/she played with.

Level of Participation
After one week we observed that the level of participation was considerable.In particular, each chunk obtained with a silence length of 30 milliseconds was played 3.09 times; each chunk obtained with a silence of 60 milliseconds was played 3.2 times; each chunk obtained with a silence length of 90 milliseconds was played 3.4 times.On average, every day each player played 28 chunks.With respect to the link mode, on average, 17 different links to external resources per video lecture were submitted and 1.5 (18 links in the 12 video lectures) was considered useful to deepen the topics covered in the video lecture.

Participants' Experience
To investigate the participants' experience, we considered a Mean Opinion Score (MOS) evaluation and we asked them to fill a questionnaire using a 5point Likert scale.In particular, we asked them four different questions: Q1: Evaluate the game in general.Q2: Was the game fun to play?Q3: Was the final transcript useful?Q4: Did the final transcript meet you expectations?
Results, presented in Figure 6, show that the game approach was considered interesting.The only score below 3 is the one related to the expectation of the final transcript.We asked participants to motivate their score and most of them expected a book-chapter with figures and formatted text, rather than plain transcript.Hence, the low score was mainly due to a misunderstanding of the term "transcript".

Transcripts Accuracy
The accuracy of the transcripts produced with ALGA was compared against a transcription automatically obtained through the Dragon Naturally Speaking v.11, one of the most accurate ASR (Automatic Speec Recognition) application available among the off-the-shelf ones.
On average, the accuracy achieved while playing the chunks obtained with a 30 ms silence-length was 89% (against 83% achieved by the ASR); the accuracy achieved while playing the chunks obtained with a 60 ms silence-length was 78% (against 73% achieved by the ASR); the accuracy achieved while playing the chunks obtained with a 90 ms silence-length was 77% (against 72% achieved by the ASR).It is to note that these percentages include the number of words truncated by the automatic splitting mechanism.

Summary of Results
From the evaluation process, it emerged that: i) participants liked the game approach; ii) the Textual mode has been largely played by participants, and iii) the accuracy of the produced transcripts is higher than the one obtained by using the ASR application.
It also emerged that the audio splitting mechanism should consider silence of 30 milliseconds length.In fact, if on the one side the 30 ms silence increased the number of truncated words (see, Table 1), on the other side, the accuracy obtained in the transcription process is much higher than the one obtained with longer silences.

Conclusions and Future Work
In this paper we proposed ALGA, an altruistic game designed to ease the process of video lecture transcription.In particular, our approach aims at involving students in this difficult task.Throughout the paper we detailed the rules and the details of the game, as well as the characteristics of the prototype we developed in order to investigate the effectiveness of our proposal.Results obtained from the experimental assessment showed that participants liked the game approach and also showed that the accuracy achieved is higher than the one obtained by speech recognition technologies.According to these results the game approach can be considered a promising and easy solution to increase the accessibility of digital video lecture contents and suggested that the gamification approach can be a right direction towards the transcription of video lectures.
Although the obtained results were interesting, it is worth noting that ALGA is just a prototype and hence different entrenchments may be introduced.For instance, it is web-based but the design of a specific app may improve the students experience in the mobile scenario.Note that this does not require any modification to the game architecture or to the game engine, but it simply requires designing a more usable interface for mobile devices.Another enhancement regards the students participation and engagement.Currently, the prototype employs two rewarding schemes: i) high-score list, and ii) availability of the transcription.Although these schemes contributed to achieve acceptable participation level, they were designed to give us some metrics on the applicability of the gamification approach.Therefore, these schemes may be unfit for long-term engagement.In future versions it would be interesting to investigate the benefits of other rewarding schemes designed to increase both the participation level and the long-term engagement in the game.For instance, we intend to incorporate and evaluate schemes that may be more rewarding (we recall here that ALGA does not consider monetary rewards): • Social Media.In the social age, updates and recommendations play a fundamental role.Social friends are invited to play specific games and are informed about game achievement.Therefore, ALGA should be linked to social media platforms in order to update (either automatically or manually) player's friends about the ALGA score and in order to let them know that he/she is playing the ALGA game.The resulting post/tweet may trigger friends to try the game (among those who don't know the game) or to play again the game (among those who already played the game).
• Credits by Play.A student may earn credits for his/her participation to the game.Indeed, the game director should set a threshold (or a set of thresholds) that corresponds to the number of earned credits (e.g., above 70% of chunks played for course X you get 1 credits).Similarly, the threshold (or a set of thresholds) may correspond to the number of questions a student may skip at the written exam (e.g., above 70% of chunks played for course X you may skip 3 questions) • Homework by Play.A student may skip a number of homework if he/she plays a predefined number of chunks.Also in this case, the threshold should be up to the game director (e.g., above 70% of chunks played for course X you may skip 1 homework assignment).
To understand the most suitable scheme we think it is necessary to dynamically understand what can motivate students.Indeed, a rewarding scheme may be valid this year, but may be unfit the next year (students and their habits change).Therefore, ALGA should ask its players to fill a questionnaire about the preferred rewarding schemes.This questionnaire should be submitted just once to any player and should be submitted either after playing some matches or after a silent period (i.e., a period of time where the student does not play with the game; in this case an invitation through email is necessary to invite him/her to compile the questionnaire).The questionnaire should ask students to rate the reasons to participate to the game (e.g., transcript availability, homework by play, credits by play, etc.) and/or the reasons for having stopped playing with ALGA.The use of one or more of these schemes should increase the participation level and the long-term engagement in the game.

Figure 1 :
Figure 1: States and transitions of an audio transcript t i .

Figure 2 :
Figure 2: ALGA architecture: it is designed as a Web-based game in order to improve accessibility.

Figure 3 :
Figure 3: Players can select the course they want to play with by selecting the course degree (left) and the course name (central).

Figure 4 :
Figure 4: Players can listen to the audio chunk and can enter the transcript.Then, the player is notified if he/she gained points, of the current position in the high-score list and is asked if he/she wants to play again.

Figure 5 :
Figure 5: A player can keep track of his/her position in any module he/she played.In addition to the current ranking position, a short cut to play the module is available.A special icon appears when the player is in first position.

Figure 6 :
Figure 6: Results of the MOS investigation.Q1: Evaluate the game in general.Q2: Was the game fun to play?Q3: Was the final transcript useful?Q4: Did the final transcript meet you expectations?Standard deviation was in the range [0.7-1.1].

Table 1 :
Characteristics of the audio chunks produced by the audio splitting mechanism.The minimum audio chunk length is set to 5 seconds and the silence length varies.The number of truncated words refers to a video lecture of 45 minutes.