Program for the automatic editing of video lectures for distance learning

Task

When shooting video lectures in classroom conditions, the video material must be edited: non-informative elements of the clips removed, color correction performed, etc., to make sure that the lecture shot “in the field” is convenient and pleasant for students to watch remotely. Often at the processing stage, you have to deal with video defects caused by problems with lighting or the projector, or emergency situations during shooting.

The manual static assembly method is used to replace the image of the projector in the video recordings with frames from the original presentation materials. This increases the intelligibility of the demonstrated presentation materials but has many disadvantages: the complexity is unnatural because of the mismatch between the color models of the video and presentation materials; "cutting off" part of the lecturer or foreign objects falling into the presentation area.

The goal was to develop a software model for the intelligent analysis of video lectures for static assembly in automatic mode and without the described disadvantages.

The developed system performs the following key functions:

Pre-processing of video data in order to improve quality, adjust color and lighting models
Video stream processing in order to detect, localize, recognize and classify elements in a frame
Removal of uninformative elements of a scene
Replacing the projector image in the video with frames from original presentation materials
Placement of a layer with the lecturer on top of overlay presentation materials
Saving video in specified formats, including for performing subsequent processing in manual mode.

Solution

The developed software package consists of two subsystems.

“Autoslide”

It automatically combines video and presentation of the lecturer in the process of video post-processing. Both files are uploaded to the system. The neural network determines which slide is shown at a particular time point and automatically replaces it with a clear, bright, properly sized slide from the original presentation. This frees the editor from the long and routine work of manual inserting the necessary materials.

The system also recognizes the figure of the lecturer in the video stream; selects it and places the image on top of the inserted slide, ensuring the realism of the output video stream.

“Auto Cameraman”

It allows shooting video lectures without the participation of a videographer. Ultra-high-resolution cameras installed in the auditorium remove the overall plan. During video processing, the system independently determines the active zone and frames the image, relying on the experience of professional cameramen.

Advantages:

Automates the cameramen and installers’ work, which routinely takes up to 80% of the time and the entire cost of production.
The production of video lectures is cheaper: routinely, up to 90% of the budget is spent on shooting and editing.
Accelerates the production of video lectures by eliminating manual editing, which takes up to 80% of the preparation time.
Improves the quality of the final video product: slides are included in the video stream more correctly, the absence of visual interference improves the perception of the material.
The affordable software is an economical budget alternative to integrated option for video conferences from world-famous manufacturers, as it does not require the purchase and installation of expensive specialized equipment.

Details

Two software tools have been developed, combined in one web interface, which allows using them both individually and together, and includes sequential processing of video lectures in each of the subsystems.

For the Autoslide subsystem to work, you need to download the source video file and presentation materials, customize settings depending on the conditions in which the video lecture was shot and the properties of the presentation materials. At the beginning of processing, presentation materials are detected throughout the entire video, which helps to stabilize the image. Next, for each sequence of frames with the same image of the presentation area, the corresponding slide is selected from the presentation materials based on the key point method; then the replacement is performed followed by color correction. At the end of the processing, the lecturer's mask is highlighted, and using it, the lecturer is placed on top of the overlay presentation; this helps to avoid unrealistic moments in the video associated with placing the presentation on top of the lecturer. You can download the file either in the video format or in the project format compatible with Adobe Premier Pro for the subsequent manual processing.

For the Auto-Cameraman subsystem to work, it is required to download the original video file previously shot from a super-high-resolution camera statically installed in the auditorium. At the beginning of processing, the lecturer and information display areas (chalk or marker boards, projectors, flip charts) are detected: specifically those that are used as part of a particular lecture to display information. Further, based on the digitized experience of the cameramen, the video stream is cropped on the basis of static positions and transitions. In the result of the Auto Cameraman subsystem work, non-informative elements of the scene are removed.

In solving these problems, neural network technologies were used. The set of initial data transmitted by the customer was marked up and used for training the neural network. Thus, a high level of realism is achieved when simulating the work of cameramen and editors.

Project Stages:

Creation of the Auto slide subsystem.
Creation of the Auto Cameraman subsystem.

Collaboration:

"Lectorium": providing initial data (including synthetic data for hypothesis testing), consultations on the work of cameramen and editors, and interim and final testing of the developed software. Laboratory PSPOD: development of algorithms and software for the intellectual analysis of video lectures.

The scientific and technical novelty of the development is automation of the activities of cameramen and installers by non-algorithmic methods. The application of this approach is more suitable for low- formalizable, partly creative work on developing video lectures.

The 99% in-studio recognition accuracy of key elements in the frame has been achieved; in the classroom conditions, it is about 95%. Processing speed amounts to some 10 frames per second.

Currently, the Auto Cameraman subsystem is being finalized in the part related to the expansion of its functionality. In addition, there are plans to create new intelligent products in the field of distance education.

Technical advantages:

The possibility of applying video lectures at the post-processing stage
Increased realism through the use of neural network technologies
Configurability of the applied modules.

Technologies

Software programming languages and frameworks	С++, Qt, OpenCV, ffmpeg, RabbitMQ AMQP
Web languages and technologies	php, javascript, css, RabbitMQ
OS	linux
Architectures	x86
CVS	git (GitLab)
DBMS/DB	MariaDB
IDE	Qt Creator
Reverse engineering	Adobe Premiere project file

Intellectual Property

Publications

The project was implemented in cooperation with the educational project “Lectorium” and with financial support from the Fund for Assistance to the Development of Small Enterprises in the Scientific and Technical Field.

Project team

Project manager - M. Bolsunovskaya
Mathematic-programmer - N. Abramov
Lead developer - K. Belyaevsky
Web developers - A. Nikitina, М. Fomina
Project manager - A. Gintsiak