Video Presentation Detector¤

A tool to automatically detect and extract presentations from videos of conferences, livestreams, or lectures that contain both presentations and break screens.

Features¤

Automatically detects transitions between break screens and presentations
Supports batch processing of multiple videos
Extracts presentations as separate video files
Extracts audio from presentations as MP3 files
Detailed output with timestamps and presentation durations
Configurable via YAML configuration file

Installation¤

Create and activate a virtual environment:

# Create a virtual environment
uv venv

# Activate it (Unix/MacOS)
source .venv/bin/activate

# Activate it (Windows)
.\.venv\Scripts\activate

Install video processor dependencies:

# Install with video processing dependencies
uv pip install ".[video_processor]"

Install FFmpeg: FFmpeg is required for video and audio extraction. The tool will not work without it.

# macOS (using Homebrew)
brew install ffmpeg

# Alternative for macOS (using MacPorts)
sudo port install ffmpeg

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

# Arch Linux
sudo pacman -S ffmpeg

# Windows
# Download from ffmpeg.org/download.html
# Extract the files and add the bin folder to your PATH

Verify installation with:

ffmpeg -version

Configuration¤

Configuration is stored in config.yaml. If not present, a default one will be created automatically.

Key configuration options:

input:
  # Single video file path (leave empty to use folder)
  video_path: ""
  # Folder containing videos to batch process
  folder: "input_videos"
  # File extensions to process from input folder
  extensions: "mp4,mkv,avi,mov,webm"

video:
  # Whether to resize frames for processing (speeds up detection but reduces accuracy)
  enable_resize: false
  # If resize is enabled, dimensions to use (width, height)
  processing_size: [320, 180]

break_detection:
  # Directory for break screen images (leave empty for auto-detection)
  images_dir: ""
  # Similarity threshold for break screen detection (0-1)
  threshold: 0.92
  # Whether to auto-detect break screens if none provided
  auto_detect: true

output:
  # Base output folder for extracted presentations
  folder: "extracted_presentations"
  # Whether to extract detected presentations as separate files
  extract_presentations: true
  # Whether to extract audio from presentations as MP3
  extract_audio: true
  # Whether to save presentation metadata as JSON
  save_metadata: true

Usage¤

Basic Usage¤

Single Video Processing¤

python -m video_manager.presentation_detector path/to/video.mp4 --extract --audio

This will: - Process the specified video - Extract individual presentations as separate video files - Extract audio from each presentation

Batch Processing¤

python -m video_manager.presentation_detector --input-folder path/to/videos --extract --audio

Advanced Options¤

usage: python -m video_manager.presentation_detector [-h] [--config CONFIG] [--output OUTPUT]
                                                    [--break-images BREAK_IMAGES] [--extract]
                                                    [--audio] [--input-folder INPUT_FOLDER]
                                                    [video_path]

arguments:
  video_path                    Path to the video file to process
  --config, -c CONFIG           Path to custom config file
  --output, -o OUTPUT           Custom output folder for extracted presentations
  --break-images, -b IMAGES     Directory containing break screen images
  --extract, -e                 Extract presentations as separate files
  --audio, -a                   Extract audio from presentations as MP3
  --input-folder, -i FOLDER     Process all videos in specified folder
  -h, --help                    Show help message

Example Commands¤

Using custom break screen detection:

python -m video_manager.presentation_detector video.mp4 --break-images path/to/break/images

Using custom output location:

python -m video_manager.presentation_detector video.mp4 --output path/to/output/folder

Using a custom config file:

python -m video_manager.presentation_detector --config path/to/custom/config.yaml

How It Works in Detail¤

1. Frame Sampling and Analysis¤

The detector samples frames at regular intervals throughout the video (configurable sampling rate). For each frame: - The frame is converted to grayscale and normalized - Visual features are extracted using image processing techniques - Frames are stored in memory for comparison

2. Break Screen Detection¤

Two methods are used for break screen detection:

Method 1: Using Provided Break Images - If provided, the detector compares sampled frames against known break screen images - Similarity is calculated using histogram comparison or structural similarity - Frames that match above the similarity threshold are classified as break screens

Method 2: Automatic Detection (when no break images are provided) - The detector clusters frames based on visual similarity - Large clusters of similar frames are identified as potential break screens - The most common frame clusters are selected as break screens

3. Transition Detection¤

Once break screens are identified, the detector: - Uses binary search to pinpoint exact frame transitions (improves accuracy) - Analyzes movement between adjacent frames to confirm transitions - Handles edge cases like brief interruptions or camera shifts

4. Presentation Extraction¤

For each detected presentation segment: - Start and end timestamps are precisely calculated - Video is trimmed using FFmpeg with no re-encoding (when possible) for fast extraction - Audio is extracted using FFmpeg's audio capabilities - Metadata is generated including duration, timestamps, and filename

Output Structure¤

For each processed video, a subdirectory is created in the output folder:

output_folder/
├── video1_name/
│   ├── presentations.txt       # Text file with presentation times
│   ├── video1_name_metadata.json   # JSON with detailed metadata
│   ├── video1_name_presentation_1.mp4  # First presentation video
│   ├── video1_name_presentation_1.mp3  # First presentation audio
│   ├── video1_name_presentation_2.mp4  # Second presentation video
│   └── video1_name_presentation_2.mp3  # Second presentation audio
├── video2_name/
│   └── ...
└── ...

Best Practices¤

Break Slide Recommendations¤

The effectiveness of presentation detection depends significantly on your break slides. Here are recommendations for good break slides:

Type	Good Examples
Static Slides	✅ High contrast ✅ Consistent layout ✅ Solid color background
Dynamic Slides	✅ Consistent elements ✅ Distinct from presentations ✅ Limited animation areas

Tips for Optimal Results¤

Use distinctive break slides
Choose break slides that are visually very different from presentation content
Solid colors or simple patterns work best
Avoid break slides that look like presentation slides
Consistent break screens
Use the same break screen throughout the recording
If multiple break screens are used, provide examples of each in the break-images folder
Processing options
For large videos, enable resizing to speed up processing
Adjust the similarity threshold if detection is too aggressive or too lax
Use custom break images for best results
Handling problematic videos
If auto-detection fails, extract a few frames of your break screens and use them as reference
For videos with quick transitions, adjust sampling rate in the config
Process in batches when dealing with many videos

License¤

MIT