






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The project 'song predictor based on mood' is a novel approach that helps the user to automatically play songs based on the emotions of the user. It aims to provide user-preferred music with emotion awareness. The proposed system defines a practical approach to solve the issue of playing music based on the user's mindset. The application analyzes the user's emotions by reading their physical facial features and then plays music that matches the detected mood of the individual. This project effectively integrates the aspects of music being a source of entertainment and its ability to reinforce the mind and help users relax. The system includes key modules such as user mood input, song recommendation, song database, and user feedback to provide an emotion-aware music experience.
Typology: Cheat Sheet
1 / 30
This page cannot be seen from the preview
Don't miss anything!























A Project Report on Submitted in Partial Fulfillment of the Requirements for the Award of the Degree of
By Pratik singh Omkar singh
Ms. Assistant Professeor DEPARTMENT OF INFORMATION TECHNOLOGY ZCT’S THAKUR SHYAMNARAYAN DEGREE COLLEGE (Affiliated to University of Mumbai) MUMBAI 400101 MAHARASH TRA 2023-
PRN No.: Roll no.: 1.Name of students : I.Pratik singh II.Omkar singh
Date: College Seal
Human beings are social animals. Unlike animals who cannot convey all of their feelings through their facial expressions, humans can express their mental states through their facial expressions. These facial expressions are very important cues in identifying a person’s feelings and their intentions. Humans can very well identify these features. But recent developments and significant research in the fields of Artificial Intelligence (AI) technology and computer vision has made it possible now for a computer to identify tiny details and label them as a known behavior in humans. These details can be properly identified through live feed from a digital camera. Music has been a source of entertainment from early times. Recent studies have also proven that music aids humans in reinforcing their minds helping them to relax. Music is therefore helpful in soothing and relaxing a person’s mind. Focusing on these two aspects, this project effectively integrates them to create an application which plays music to calm the person based on their mood. This system can help to lower the user’s stress levels through music therapy. In the recent times, people are facing conditions like hypertension and anxiety, leading to a very stressful life. The corona virus pandemic even worsened the previous situation. This bad mindset is caused due to issues like income and spending balance, stress at workplace etc., Music has been a medium of entertainment and leisure since the ancient times. Music reduces the stress through the science of acoustics. For example, temples use bells and gongs whose frequency creates a calming effect on the people who are inside the temple premises. But, the effect of music is considerable when the beats of the music match the mood of the user. Through the advancement in technology, we now have music available at our fingertips these days. But unfortunately, there are no effective music applications which play music based on the user’s mindset. In order to act on this problem, the following proposed system defines a practical approach to solve this issue and create an effective music player which plays music based on the emotions of the user. Music is very important not only in people's daily lives, but also in today's technological society. Users usually need to actively explore and select music playlists. Here a proposal an efficient and accurate model for generating playlists according to the user's current emotional state and behavior. Existing approaches to automatic playlist creation are computationally inefficient, inaccurate, and may involve the use of additional equipment such as EEGs and sensors. Language is the oldest and most natural way to convey emotions, feelings and moods, and processing them is computationally intensive, time-consuming and costly.
CHAPTER 1
This application analyses the user’s emotions by reading their physical facial features after which it plays music that matches the detected mood of the individual. For instance, if the user is feeling sad, the application will play an upbeat happy song to uplift his/her mood. The present systems employ algorithms which yield unforeseen results which result in less accuracy. Also, they run continuously in the background which leads to wastage of resources making it inefficient as well. Audio analysis systems also cannot gather significant information from audio signals in less time. The existing designs even use supplementary sensing devices or use human speech which is quite inefficient. These systems cannot properly associate the perception of the user with music. To determine an individual's facial expression, a comparison with comparable expressions can be made. Mary Duenwald released an article in 2005 that summarized the findings of various studies and researches conducted by scientists, proven that there are basically seven categories of face expressions over the world, in which the following are the basic: i. Sadness: When a person is in a sad state, the person’s eyebrows come closer while the inner part of the eyelids go up. The corners of the lips appear to be in the shape of a downward circular arc. Also, the lower lip may push up to form a mope. ii. Anger: When a person is angry, the upper and lower eyelids squeeze towards each other keeping the eyeball visible through a small slit between the eyelids. Additionally, the lower lip pushes up a little, the top and lower lips press against each other, causing the jaw to move forward. iii. Happiness: In order to identify if a person is happy, the sides edges of the lips are raised and the mouth is shaped like a bottom arc of a semicircle. At the same time, the eyelids transition closer a bit while the cheeks go up with the smile while the eyebrows go down slightly. iv. Calm: In the calm face the mouth is in the same position and also the eyelids are in the same position as the normal one. This is the default emotion for the project.
Next, we will be explaining more in detail about Requirements Analysis and further more Survey of Technologies. Chapter 2
The features available in the existing Music players present in computer systems are as follows: i. Manual selection of Songs ii. Party Shuffle iii. Playlists iv. v. Music squares where user has to classify the songs manually according to particular emotions for only four basic emotions. The four basic emotion are i. Sadness ii. Anger iii. Happiness iv. Calm
Here wepropose a Emotion based music player.Song predictor based on mood is an music player which play songs according to the emotion of the user. It aims to provide userpreferred music with emotion awareness. Song predictor based on mood is based on the idea of automating much of the interaction between the music player and its user. The emotions are recognized using a machine learning method Support Vector Machine(SVM )algorithm. In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyse data used for classification and regression analysis. It finds an optimal boundary between the possible outputs. The training dataset which we used is Olivetti faces which contain 400 faces and its desired values or parameters. The webcam captures the image of the user. It then extract the facial features of the user from the captured image. The training process involves initializing some random values for say smiling and not smiling of our model, predict the output with those values, then compare it with the model's prediction and then adjust the values so that they match the predictions that were made previously. Evaluation allows the testing of the model
against data that has never been seen and used for training and is meant to be representative of how the model might perform when in the real world. According to the emotion, the music will be played from the predefined directories
a). Users don’t want to select song manually. b). No need of playlist. c).Users don’t want to classify the songs based on the emotions.
Convolutional Neural Network is one of the main categories to do image classification and image recognition in neural networks. Scene labeling, objects detections, and face recognition, etc., are some of the areas where convolutional neural networks are widely used. CNN takes an image as input, which is classified and process under a certain category such as dog, cat, lion, tiger, etc. The computer sees an image as an array of pixels and depends on the resolution of the image. Based on image resolution, it will see as h * w * d, where h= height w= width and d= dimension. For example, An RGB image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 1 array of the matrix. In CNN, each input image will pass through a sequence of convolution layers along with pooling, fully connected layers, filters (Also known as kernels). After that, we will apply the Soft-max function to classify an object with probabilistic values 0 and 1.
The convolution of 55 image matrix multiplies with 33 filter matrix is called "Features Map" and show as an output. Convolution of an image with different filters can perform an operation such as blur, sharpen, and edge detection by applying filters.
Stride is the number of pixels which are shift over the input matrix. When the stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is equaled to 2, then we move the filters to 2 pixels at a time. The following figure shows that the convolution would work with a stride of 2.
Padding plays a crucial role in building the convolutional neural network. If the image will get shrink and if we will take a neural network with 100's of layers on it, it will give us a small image after filtered in the end. If we take a three by three filter on top of a grayscale image and do the convolving then what will happen? It is clear from the above picture that the pixel in the corner will only get covers one time, but the middle pixel will get covered more than once. It means that we have more information on that middle pixel, so there are two downsides: o Shrinking outputs o Losing information on the corner of the image. To overcome this, we have introduced padding to an image. " Padding is an additional layer which can add to the border of an image ."
The fully connected layer is a layer in which the input from the other layers will be flattened into a vector and sent. It will transform the output into the desired number of classes by the network. In the above diagram, the feature map matrix will be converted into the vector such as x1, x2, x3... xn with the help of fully connected layers. We will combine features to create a model and apply the activation function such as softmax or sigmoid to classify the outputs as a car, dog, truck, etc.
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source code is also available under the GNU General Public License (GPL). This tutorial gives enough understanding on Python programming language. 2.4. HISTORY OF PYTHON Python was developed by Guido van Rossum in the late eighties and early nineties at the National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68, SmallTalk, and Unix shell and other scripting languages. Python is copyrighted. Like Perl, Python source code is now available under the GNU General Public License (GPL). Python is now maintained by a core development team at the institute, although Guido van Rossum still holds a vital role in directing its progress.
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax. This allows the student to pick up the language quickly. Easy-to-read − Python code is more clearly defined and visible to the eyes. Easy-to-maintain − Python's source code is fairly easy-to-maintain A broad standard library − Python's bulk of the library is very portable and cross-platform compatible on UNIX, Windows, and Macintosh. Portable − Python can run on a wide variety of hardware platforms and has the same interface on all platforms. GUI Programming − Python supports GUI applications that can be created and ported to many system calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X Window system of Unix Databases − Python provides interfaces to all major commercial databases. Scalable − Python provides a better structure and support for large programs than shell scripting.
It supports functional and structured programming methods as well as OOP. It can be used as a scripting language or can be compiled to byte-code for building large applications. It provides very high-level dynamic data types and supports dynamic type checking. It supports automatic garbage collection.
Description: Users shall be able to provide feedback on recommended songs. Specification: The system shall include a feedback mechanism allowing users to like or dislike songs, and this feedback shall be used to improve future recommendations.
Software Requirements:
Hardware Requirements:
Here's a detailed plan for planning and scheduling the development of song predicator based on mood, a platform, based on the specified modules: 1.Introduction: Task: Define the purpose, objectives, stakeholders, and constraints of the project. Timeline: 1 week