Real-Time Object Detection using MobileNet SSD: A Comprehensive Report, Study Guides, Projects, Research of Computer science

A detailed report on a computer science project focused on real-time object detection using the mobilenet ssd architecture. it covers the project's rationale, goals, objectives, implementation details, and results. The report includes sections on requirement engineering, model implementation, performance optimization, data processing, and testing, along with a discussion of the project's market potential, innovativeness, and usefulness. the project demonstrates the successful implementation of a high-performance object detection system on mobile devices.

Typology: Study Guides, Projects, Research

2024/2025

Available from 04/27/2025

yash-yashu-1
yash-yashu-1 🇺🇸

2 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BOARD OF TECHNICAL EDUCATION
DEPARTMENT OF COMPUTER SCIENCE
Real Time Object Detection using CNN
UNDER THE GUIDENCE OF :
ASHWINI. MS ,
Senior scale lecturer
Department of CSE SUBMITTED BY:-
YASHWANTH KV
M.Tech
CIE-2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Real-Time Object Detection using MobileNet SSD: A Comprehensive Report and more Study Guides, Projects, Research Computer science in PDF only on Docsity!

BOARD OF TECHNICAL EDUCATION

DEPARTMENT OF COMPUTER SCIENCE

Real Time Object Detection using CNN

UNDER THE GUIDENCE OF :

ASHWINI. MS ,

Senior scale lecturer

Department of CSE SUBMITTED BY:-

YASHWANTH KV

M.Tech

CIE-

FACULTY SUPERVISOR

SIGNATURE

INTERSHIP GRADING RUBRICS (CIE-2)

NAME:

EVALUATION CRITERIAN

POOR AVERAG E GOOD EXCELLENT

Intern’s ability to apply the skill

and technical knowledge

( 30 MARKS)

Intern’s performance on assigned

tasks and project.

(10 MARKS)

0- (^3) 4- 6 7- 8 9- 10

Extent of Intern’s ability to add

value to the organization through

internship

(10 MARKS)

SUB TOTAL (50 MARKS) USE CASE - II (30 MARKS) TOTAL MARKS OBTAINED

Real-Time Object Detection using MobileNet SSD Acknowledgement We extend our sincere gratitude to the research community whose advanced work in deep learning and computer vision has made this study possible. Special thanks to the teams at Google Research for developing and open-sourcing the MobileNet architecture and Single Shot Detector (SSD) framework, which formed the foundation of our work. We acknowledge the valuable support provided by GKV GLOBAL TECHNOLOGY for granting access to the GPU computing resources essential for training and optimizing our models. Our appreciation goes to Sir.Vijayan G for their invaluable guidance and insights throughout this research. We thank the analyst who helped create our dataset and the volunteers who participated in our real-world testing phase. Their contributions were crucial in validating the practical applications of our system. This work was partially supported by Vijayan G / Managing Director of GKV GLOBAL TECHNOLOGY. We also thank the developers of various open- source libraries and tools that facilitated our implementation.

Abstract Real-time multi-object detection on resource-constrained devices presents significant challenges in balancing accuracy and computational efficiency. This paper presents an implementation of MobileNet Single Shot Detector (SSD), which achieves robust object detection while maintaining real-time performance on mobile platforms. The architecture leverages depth-wise separable convolutions from MobileNet as the backbone network, combined with the SSD framework for efficient feature extraction and object localization. Our approach achieves 22 frames per second on mobile devices while maintaining a mean Average Precision (mAP) of 68% on the dataset. The model's architecture reduces computational complexity through factorized convolutions, resulting in a 9x reduction in parameters compared to traditional CNN architectures. We introduce an adaptive feature pyramid network that dynamically adjusts feature resolution based on object scale, improving detection accuracy for small objects by 15% without significant computational overhead. Furthermore, our implementation includes a novel quantization scheme that reduces model size by 75% while maintaining accuracy within 2% of the full-precision model. Experimental results demonstrate the effectiveness of our approach across various object categories and lighting conditions, making it suitable for real-world applications such as autonomous navigation, surveillance, and augmented reality. Our contribution provides a practical solution for deploying high-performance object detection systems on mobile devices with limited computational resources.

Chapter 1 Introduction The field of computer vision has seen remarkable advancements, particularly in the domain of object detection. This project focuses on developing a sophisticated system for real-time object detection, specifically targeting the identification of common objects such as people, chairs, smartphones, and water bottles. Our implementation utilizes Python to achieve both high accuracy and real-time performance capabilities. The system is designed to process real-time image input and generate precise bounding boxes around detected objects, while simultaneously providing accurate classification labels for each identified item.Efficient and accurate object detection has been an important topic in the advancement of computer vision systems. Our project aims to detect objects such as a person, chair, smartphone, and water bottle with the goal of achieving high accuracy with real-time performance using Python implementation. The input to the system will be a real-time image, and the output will be a bounding box corresponding to all the objects in the image, along with the class of the object in each box. 1.1 Rationale Object detection represents one of the most challenging aspects of computer vision, requiring innovative and sophisticated solutions to address its complexity. In

today’s rapidly evolving technological landscape, there is an increasing demand for systems that can perform accurate and swift object detection in real-world scenarios. This demand is particularly driven by the growing need for automated visual recognition systems across various industries and applications. Recent advancements in deep learning technologies have made it possible to achieve real- time detection capabilities, opening new possibilities for practical applications. 1.2 Goal The goal is to develop a Python-based object detection system that can:

  • Accurately locate objects within images
  • Classify objects into appropriate categories
  • Perform detection in real-time
  • Handle multiple object detection simultaneously
  • Provide high accuracy with minimal computational overhead 1.3 Objective Our project aims to develop a comprehensive Python-based object detection system that excels in multiple aspects of visual recognition. The system is designed to precisely locate and identify objects within images, providing accurate classification across various categories. A key focus is maintaining real-time performance while handling multiple object detection simultaneously. The

1. 5 Role Team Members:

  1. YASHWANTH KV
  • Project Planning
  1. UDAYA GIRI BR
  • Model Implementation 3 .AKASH K
  • Performance Optimization 4 .DARSHAN D AND CHANDAN GOWDA DR
  • Data Processing
  1. MUTTURAJU SM
  • Testing
  • Documentation 1. 6 Contribution of Project

1. 6. 1 Market Potential

  • Growing demand for computer vision applications
  • Wide range of applications in security, retail, and automation

6 .GAGAN GOWDA

  • Jupyter Notebook for testing and visualization
  • Version control system (Git) mm 2.1.2 Libraries and Frameworks
  • argparse
  • OpenCV
  • NumPy
  • MobileNet SSD pre-trained models 2.1.3 Programming Language
  • Python 3.7 or higher

2.1.4 OS (Operating System)

  • Windows 10/
  • Linux (Ubuntu 20.04 or higher)
  • macOS (10.15 or higher) 2.2 Requirements

2.2.1 Functional Requirements

  • Real-time video input processing
  • Multiple object detection capability
  • Classification of detected objects
  • Bounding box visualization
  • Confidence score display
  • Frame rate optimization
  • Model selection flexibility

2.2.2 Non-Functional Requirements

Hardware Requirements:

  • Processor: Intel Core i5 or higher
  • RAM: 8GB minimum
  • GPU: NVIDIA ,INTEL IRIS
  • Storage: 20GB minimum
  • Camera: HD webcam

Software Requirements:

  • Python 3.7+
  • Required Python libraries
  • Compatible operating system

Internals questions

  1. What is the purpose of the confidence_threshold parameter in the ObjectDetector class, and how does it affect the model's performance? Ans:- the significance of the confidence threshold in filtering detections and its impact on the trade-off between false positives and false negatives.
  2. Why does the model resize the input frame to 300x300 pixels before processing? Ans:- the input requirements of the MobileNet-SSD architecture and how resizing affects detection accuracy and computational efficiency.
  3. How does the blobFromImag function preprocess the input image, and why are the specific values (e.g., 0.007843, (127.5, 127.5, 127.5)) used? Ans:- the preprocessing steps necessary for the Caffe model, including normalization and mean subtraction, and their roles in model performance.
  4. What are the limitations of using the MobileNet-SSD model for object detection, especially in real-time applications? Ans:- the model's speed-accuracy trade-offs, potential challenges with small or occluded objects, and hardware requirements.
  5. How does the model handle scaling the detected bounding boxes back to the original frame size, and why is this step necessary? Ans:- the post-processing steps, particularly the mathematical transformations used to map detections from the resized frame to the original dimensions.
  6. What modifications would be needed to add a new object class (e.g., "laptop") to the detection capabilities of this model? Ans:- the process of extending the model's class labels, including potential retraining or fine-tuning requirements and updating the classNames dictionary.