Parallel video transcoding using Hadoop MapReduce

Video transcoding has become a key technology for video content distribution network service. In this paper, we propose a novel MapReduce-based parallel video transcoding method. In our method, video files are stored on a shared file system to reduce the overhead of disks I/O and networks in the Hadoop MapReduce. FFmpeg is used to compute the splitting point of the video and the actual video transcoding. Experimental results show that our method can significantly reduce the time of transcoding.

are stored on a shared file system to reduce the overhead of disks I/O and networks in the MapReduce.Video splitting is no need to actually split a video, just computing time code of video's splitting point by FFmpeg and recording the time code.

Architecture of parallel video transcoding system
Parallel video transcoding system is composed of the application server, shared file system, and Hadoop cluster.The latter two are managed by cloud management software CloudStack.The system can be divided into three layers logically, as shown in Figure1.From bottom to top, they are computing and storage layer, application service layer and portal layer.(2) Application service layer: The core function of application service layer is to start and schedule the jobs of video transcoding.Usually, the user sends a transcoding requests to the web application server, then application server submits the job of transcoding to Hadoop namenode according to the user's transcoding job requirement.Namenode assign the transcoding job to the multiple datanode, and datanode call FFmpeg to execute the parallel video transcoding.
(3) Portal layer: Portal provide users with a variety of business functions, such as the upload of the original video, the setting of transcoding requirement template, and the query of the transcoding job.

Implementation of parallel video transcoding system
Parallel video transcoding mainly consists of two phases, the split of the video clip and the distributed video transcoding using Hadoop MapReduce, as shown in Figure 2.

The split of video clip.
The purpose of video splitting is to segment whole video clip into multiple sub-clips.Hadoop MapReduce schedules multiple mappers to execute FFmpeg transcoding command for these sub-clips.In our implementation, there is no need to actually split the video.We just compute and record the time code of the split point.This kind of parallel video transcoding can reduce the transcoding time cost of the whole video clip.As for the number of sub-clips can be decided by the number of datanode in Hadoop cluster.Usually, the number of sub-clips is equal to the number of datanode.It is worth noting that these split point must be the key frame of the video, otherwise it will cause the lack or the repetition of video frame after the combination of video clips in Reducer.

Experiment and discussion
In this paper, we compare the time efficiency between single machine video transcoding and parallel transcoding in Hadoop.The experimental environment is a Hadoop cluster with four nodes, one of the nodes is namenode, the other three are datanodes, and the hardware configuration of the four nodes are all two 1GHz CPU and 16GB memory.The experiment data are three videos with different file size, as shown in Table1.The video transcoding requirement information is shown in Table 2 and the experimental results are shown in Table 3.It is observed from Table 3 that the efficiency of parallel video transcoding is not particularly noticeable when the file size of the video is small, such as video1 (8316KB).This is mainly due to the time cost of the scheduling of Hadoop MapReduce and the merging of the video.However, when the file size of the video is large, such as video3 (5.46GB), efficiency improvement is obvious by the parallel video transcoding.The time cost of the parallel video transcoding is only 40% of the single machine video transcoding.If we configure more high-performance transcoding nodes in the cloud, the efficiency of video transcoding will be further enhanced.

Conclusions
With the high-speed development of Internet video services, it is an urgent problem for CDN providers to solve the technology of fast video transcoding.This paper proposes a novel parallel video transcoding method based on original Hadoop MapReduce and FFmpeg to achieve efficient distributed transcoding.The key point of the method is the splitting of video and the parallel transcoding, which significantly improve the speed of transcoding.In further studies, we are planning to study job scheduling of the video transcoding to improve the overall efficiency of the transcoding system.

Fig 1 .
Fig 1.The framework overview of parallel video transcoding system (1) Computing and storage layer: The Hadoop cluster takes the computing tasks of video transcoding.The storage system consists of two parts, in which the original videos and videos after transcoding are stored in the shared file system, while the video transcoding task file (configuration file) is stored in the HDFS of Hadoop cluster.(2)Application service layer: The core function of application service layer is to start and schedule the jobs of video transcoding.Usually, the user sends a transcoding requests to the web application server, then application server submits the job of transcoding to Hadoop namenode according to the user's transcoding job requirement.Namenode assign the transcoding job to the multiple datanode, and datanode call FFmpeg to execute the parallel video transcoding.(3)Portal layer: Portal provide users with a variety of business functions, such as the upload of the original video, the setting of transcoding requirement template, and the query of the transcoding job.

Fig 2 .
Fig 2. The framework overview of parallel video transcoding system

Table 1 .
Video information for performance evaluation.

Table 2 .
Requirement parameters for video transcoding.

Table 3 .
Time performance comparison between single machine video transcoding and parallel transcoding in Hadoop.