Projects In Development

Deepfake Detection System

Dec 2025 to Dec 2026

Project Preview Coming Soon
94%
CNN Accuracy
3
Analysis Streams
9
Technologies Used

Overview

Deepfakes are getting scarily convincing, and most people can't tell the difference anymore. This project is my attempt to build something that can, by looking at video, audio, and timing all at once to spot the subtle signs of manipulation.

It processes video frames, audio tracks, and frame-to-frame consistency through a multi-stage pipeline, then spits out a confidence score for each piece of media. The idea is to give both media organizations and regular users a straightforward way to check if what they're watching is real.

Key Features

  • Frame-by-frame video analysis using CNNs to detect facial artifacts and lighting inconsistencies
  • Audio stream analysis for voice synthesis patterns, pitch anomalies, and lip-sync mismatches
  • Temporal consistency checking to catch inter-frame artifacts and unnatural motion
  • Real-time detection for both live streams and uploaded media files
  • Detailed reports with per-frame confidence scores and highlighted regions of concern
  • Clean web interface for uploading media and exploring results interactively

How It's Built

The detection engine uses a three-stream architecture:

  • Visual Stream: A fine-tuned EfficientNet backbone processes individual frames, looking for facial artifacts, inconsistent lighting, and compression anomalies that signal manipulation
  • Audio Stream: Mel-spectrogram analysis with a separate CNN picks up voice synthesis patterns, unnatural pitch shifts, and audio-visual sync issues
  • Temporal Stream: An LSTM layer analyzes sequences of frame-level predictions to spot inter-frame inconsistencies and flickering artifacts that single-frame analysis would miss

FastAPI handles high-throughput inference on the backend, while OpenCV manages efficient video decoding and frame extraction. A React frontend provides a clean interface for uploading media, tracking analysis progress, and exploring per-frame heatmaps.

Interesting Challenges

  • Dataset Imbalance: Real-world deepfake datasets are heavily skewed toward authentic content. Stratified sampling and focal loss prevent the model from just predicting "real" every time
  • Inference Speed: Full video analysis was too slow for practical use. Keyframe extraction and adaptive frame sampling cut processing time by 60% while keeping accuracy intact
  • Generalization: A model trained on one deepfake method often fails on others. Multi-dataset training with domain randomization helps the system detect manipulation across different generation techniques

Screenshots