Banner-create

Frequently asked questions

How it's works ?

MixTheWeb builds automatic video mashups from one music track, the track of reference, and several videos provided by the user.
The output mashup is composed of excerpts from the input videos, designed to catch well the rythmic structure of the track of reference. The result can be seen as a videoclip.

The aggregation system proceeds in 3 main steps.

  • Coarse music segmentation

    The first step is to detect key points in the track of reference, and hence to proceed to segmentation in homogeneous regions.

  • Detection of the activity

    Activity is a measure of what happens within one such region. The key idea of the video mashup is to try to match activities within videos and music, and build such a description of the data that can be valid for either music or videos.

    1. The detection of activity in music is computed by calculating the note onset detection function which is a classic feature for describing audio content.

    2. The detection of activity in video is computed by extraction of quick video transitions, called cuts. The technique is based on the study of the temporal evolution of distance between successive images.

  • Aggregation

    The two previous steps define the criteria used for the content aggregation. Segmentation of the track of reference into areas and calculation of activity within each one allow us to match well video and music.
    For each area, we have an excerpt taken from the videos that maximize correlation (similarity) of the activities.