









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Module 1 of Kerala Technological University (KTU).
Typology: Slides
1 / 17
This page cannot be seen from the preview
Don't miss anything!










PreparedByAbinPhilip,AsstProf,TocH.
Module 1 I–Syllabus 1 .1Introduction to Machine Learning, 1 .2 Examples of Machine Learning applications – 1 .3 Learning associations, 1 .4Classification, 1 .5Regression, 1 .6UnsupervisedLearning, 1 .7ReinforcementLearning. 1. 8 Supervised learning- 1 .9 Inputrepresentation, 1. 1 0 Hypothesisclass, 1. 1 1Version space, 1. 1 2 Vapnik- Chervonenkis(VC) Dimension
WhatdoyoumeanbyMachineLearning?
“the field ofstudy thatgivescomputersthe ability to learn withoutbeing explicitly programmed.” - ArthurSamuel
Machinelearningisanapplicationofartificialintelligence(AI) thatprovidessystemsthe ability to automatically learn and improve from experience without being explicitly programmed.Machinelearningfocusesonthedevelopmentofcomputerprogramsthatcan accessdataanduseitlearnforthemselves. The process of learning begins with observations or data, such as examples, direct experience,orinstruction,inordertolookforpatternsindataandmakebetterdecisions inthefuturebasedontheexamplesthatweprovide.Theprimaryaim isto allow the computerslearnautomaticallywithouthumaninterventionorassistanceandadjustactions accordingly. TypesofMachineLearning
● Supervised–Classification,Regression,Associationlearning ● Unsupervised–Clustering ● ReinforcementLearning–Qlearning
Acomputerprogramissaidto learnfromexperienceEwithrespecttosomeclassoftasksT andperformancemeasureP,ifitsperformanceattasksT,asmeasuredbyP,improves withexperienceE.
Example
Handwritingrecognitionlearningproblem
Explainassociationrulelearningwithanexample
Inthecaseofretail—forexample,asupermarketchain—oneapplicationofmachinelearning is basketanalysis,whichisfindingassociationsbetweenproductsboughtbycustomers:If peoplewhobuy Xtypicallyalsobuy Y,andifthereisacustomerwhobuys Xanddoesnot buy Y,thenisheorsheapotential Ycustomer.Oncewefindsuchcustomers,wecan targetthemforcross-selling.
Infindingan associationrule,weareinterestedinlearningaconditionalprobabilityofthe
PreparedByAbinPhilip,AsstProf,TocH.
form P(Y| X) where Yistheproductwewouldliketoconditionon X,whichistheproduct orthesetofproductswhichweknowthatthecustomerhasalreadypurchased.
Letussay,goingoverourdata,wecalculatethat P(chips|beer ) =0. 7.
Then,wecandefinetherule:
7 0percentofcustomerswhobuybeeralsobuychips.
Wemaywanttomakeadistinctionamongcustomersandtowardthis,estimate P(Y| X,D) where Disthesetofcustomerattributes,forexample,gender,age,maritalstatus,andso on,assumingthatwehaveaccesstothisinformation
Explainthe2typesofSupervisedlearningproblems(classificationandregression) ExplainClassificationproblemwithexamples Differentiatebetweenbinaryandmulticlassclassification Explainpatternrecognitiontechniqueanditsapplications WhatdoyoumeanbyOutlierdetection? Whatdoyoumeanbydiscriminantfunctionincaseofclassification?
Inmachinelearning, classificationistheproblemofidentifyingtowhichsetofcategoriesa newobservationbelongsto,onthebasisofatrainingsetofdatacontainingobservations (orinstances) whosecategorymembershipisknown.
Considerthefollowingexample - Itisimportantforthebanktobeabletopredictin advancetheriskassociatedwithaloan,whichistheprobabilitythatthecustomerwill defaultandnotpaythewholeamountback.In creditscoringthebankcalculatestherisk giventheamountofcreditandtheinformationaboutthecustomer.Theinformationabout thecustomerincludesdatawehaveaccesstoandisrelevantincalculatinghisorher financialcapacity—namely, income, savings, collaterals, profession, age, pastfinancial history,andsoforth.Thebankhasarecordofpastloanscontainingsuchcustomerdata andwhethertheloanwaspaidbackornot.Fromthisdataofparticularapplications,the aimistoinferageneralrulecodingtheassociationbetweenacustomer’sattributesandhis risk.Thatis,themachinelearningsystem fitsamodeltothepastdatatobeableto calculatetheriskforanewapplicationandthendecidestoacceptorrefuseitaccordingly. (canshortentheexampleandwrite)
Thisisanexampleofa classificationproblemwhere there are two classes (binary classification):- low-riskandhigh-riskcustomers.Theinformation about a customer makes up the input to the classifierwhosetaskistoassigntheinputtoone ofthetwoclasses.
Aftertrainingwiththepastdata,aclassification rulelearnedmaybeoftheform
IF income > θ 1 AND savings > θ 2 THEN low-risk ELSEhigh-risk
PreparedByAbinPhilip,AsstProf,TocH.
Anotherdifferenceofspeechisthattheinputis temporal;wordsareutteredintime asasequenceofspeechphonemesandsomewordsarelongerthanothers ● Biometricsisrecognitionorauthenticationofpeopleusingtheirphysiologicaland/or behaviouralcharacteristics thatrequires an integration ofinputs from different modalities. Examples of physiologicalcharacteristics are images of the face, fingerprint,iris,andpalm;examplesofbehavioralcharacteristicsaredynamicsof signature,voice,gait,andkeystroke.
KnowledgeExtraction–Learningarulefromdataallows knowledgeextraction.Theruleisa simplemodelthatexplainsthedata,andlookingatthismodelwehaveanexplanationabout theprocessunderlyingthedata.Forexample,oncewelearnthediscriminantseparatinglow
OutlierDetection-Anotheruseofmachinelearningis outlierdetection,whichisfindingthe instancesthatdonotobeytheruleandareexceptions.Inthiscase,afterlearningtherule, wearenotinterestedintherulebuttheexceptionsnotcoveredbytherule,whichmay implyanomaliesrequiringattention— forexample,fraud.
Compression-Learningalsoperforms compressioninthatbyfittingaruletothedata,we getanexplanationthatissimplerthanthedata,requiringlessmemorytostoreandless computationtoprocess.Onceyouhavetherulesofaddition,youdonotneedtoremember thesumofeverypossiblepairofnumbers.
SomeotherclassificationapplicationincludesSpamFiltering–wherethetaskistoclassifya mailasspam ornotbasedonvariousattributes, NaturalLanguageprocessing, machine Translation.
DifferentiatebetweenClassificationandRegressiontechnique ➢ Explainregression(linearandpolynomial) withexample
Inmachinelearning,a regressionproblemistheproblemofpredictingthevalueofanumeric variablebasedonobservedvaluesofthevariable.Thevalueoftheoutputvariablemaybe anumber,suchasanintegerorafloatingpointvalue.Theseareoftenquantities,suchas amountsandsizes.Theinputvariablesmaybediscreteorreal-valued.
Letussaywewanttohaveasystemthatcanpredictthepriceofausedcar.Inputsare thecarattributes—brand,year,enginecapacity,mileage,andotherinformation—thatwe believeaffectacar’sworth.Theoutputisthepriceofthecar.Suchproblemswherethe outputisanumberare regressionproblems.
PreparedByAbinPhilip,AsstProf,TocH.
Let Xdenotethecarattributesand Ybethepriceofthecar.Againsurveyingthepast transactions,wecancollectatrainingdataandthemachinelearningprogramfitsafunction tothisdatatolearn Yasafunctionof X.Anexampleisgiveninfigurebelowwherethe fittedfunctionisoftheform y= wx+ w 0 ,forsuitablevaluesof wand w 0.
The approach in machine learningisthatwe assume a modeldefined up to a setof parameters:
y= g(x| θ)
where g(· ) isthemodeland θareitsparameters.
Yisanumberinregressionandisaclasscode(e.g., 0 / 1 ) inthecaseofclassification.
g(· ) is the regression function and in classification, itis the discriminantfunction separatingtheinstancesofdifferentclasses.
Themachinelearningprogram optimizestheparameters, θ, suchthattheapproximation errorisminimized,thatis,ourestimatesareascloseaspossibletothecorrectvalues giveninthetrainingset.( Refermodule3toseehowtheerroriscalculated) Forexampleinfigure,themodelislinear ( ie: y= wx+ w 0 )and wand w 0 are the parameters optimized for best fit to the training data.
PreparedByAbinPhilip,AsstProf,TocH.
Regression Classification
● Regression is used to predictcontinuousvalues.
● Examples – once a modelis trained basedonsampledata
o Predicting price of a house giventhearea,noofbedrooms o Predicting amount of rainfall giventemperature,humidityetc
● Classificationisusedtopredictwhich classadatapointispartof(discreet value) ● Example–
o Classifyingmailasspam ornot spam o Identifying a fruit based on size, color, length , diameter etc o Identifyingifatumorisbegin ormalignant
Both regression and classification are supervised learningproblemswherethereisan input, X,anoutput, Y,andthetaskistolearnthemappingfrom theinputtothe output.
Theapproachinmachinelearningisthatweassumeamodeldefineduptoasetof parameters:
y= g(x| θ) , where g(· ) isthemodeland θareitsparameters. Y is a number in regression and is a class code (e.g., 0 / 1 ) in the case of classification.
g(· ) isthe regression function and in classification, itisthe discriminantfunction separatingtheinstancesofdifferentclasses.
CompareSupervisedandUnsupervisedlearningwithexample ExplainUnsupervisedlearningwithexample ExplainsomeapplicationsofUnsupervisedlearning
● Insupervisedlearning,theaimistolearnamappingfromtheinputtoanoutput whosecorrectvaluesareprovidedbyasupervisor.Inunsupervisedlearning,thereis nosuchsupervisorandweonlyhaveinputdata. ● Theaimistofindtheregularitiesintheinput.Thereisastructuretotheinput spacesuchthatcertainpatternsoccurmoreoftenthanothers,andwewanttosee whatgenerally happensand whatdoesnot. In statistics, thisiscalled density estimation.
Example-Clustering(refermodule6fordetaileddescription)
● Clusteringisthetaskofdividingthepopulationordatapointsintoanumberof groupssuchthatdatapointsinthesamegroupsaremoresimilartootherdata
PreparedByAbinPhilip,AsstProf,TocH.
pointsinthesamegroupthanthoseinothergroups.Insimplewords,theaimisto segregategroupswithsimilartraitsandassignthemintoclusters. ● Clusteringhasalargeno.ofapplicationsspreadacrossvariousdomains.Someof themostpopularapplicationsofclusteringare: oRecommendationengines oMarketsegmentation oSocialnetworkanalysis oSearchresultgrouping oMedicalimaging oImagesegmentation oAnomalydetection
ClusteringforImageCompression
Inthiscase,theinputinstancesareimagepixelsrepresentedasRGBvalues.Aclustering programgroupspixelswithsimilarcolorsinthesamegroup,andsuchgroupscorrespondto thecolorsoccurringfrequentlyintheimage.Ifinanimage,thereareonlyshadesofa smallnumberofcolors,andifwecodethosebelongingtothesamegroupwithonecolor, forexample,theiraverage,thentheimageisquantized.Letussaythepixelsare24bits torepresent16millioncolors,butifthereareshadesofonly64maincolors,foreach pixelweneed6bitsinsteadof 24 .Forexample,ifthescenehasvariousshadesofbluein differentpartsoftheimage,andifweusethesameaverageblueforallofthem,welose thedetailsintheimagebutgainspaceinstorageandtransmission.
ClusteringforDocumentClustering
In documentclustering,theaimistogroupsimilardocuments.Forexample,newsreports canbesubdividedasthoserelatedtopolitics,sports,fashion,arts,andsoon.Commonly, adocumentisrepresentedasa bagofwords,thatis,wepredefinealexiconof Nwords andeachdocumentisan N-dimensionalbinaryvectorwhoseelement iis1ifword iappears inthedocument;suffixes“–s” and“–ing” areremovedtoavoidduplicatesandwords suchas“of,” “and,” andsoforth,whicharenotinformative,arenotused.Documents arethengroupeddependingonthenumberofsharedwords.
PreparedByAbinPhilip,AsstProf,TocH.
ExplainReinforcementlearningwithanexample,howisitdifferentfromSupervised andUnsupervisedlearning.
Insomeapplications,theoutputofthesystemisasequenceof actions.Insuchacase,a singleactionisnotimportant; whatisimportantisthe policythatisthesequenceof correctactionstoreachthegoal.
InReinforcementlearningscenariothereisa decision maker, called the agent, that is placedinan environment.Atanytime,the environment is in a certain state .The decisionmakerhasasetof actionspossible. Once an action is chosen and taken, the state changes. The solution to the task requiresasequenceofactions,andweget feedback, in the form of a reward. The learning agentlearnsthe bestsequence of actionstosolveaproblemwhere“best” isquantifiedasthesequenceofactionsthathas themaximumcumulativereward.Suchisthesettingof reinforcementlearning.
The mathematicalframework for defining a solution in reinforcement learning scenario is calledMarkovDecisionProcess.Thiscanbedesignedas: ● Setofstates,S ● Setofactions,A ● Rewardfunction,R ● Policy,π ● Value,V Wehavetotakeanaction(A) totransitionfromourstartstatetoourendstate( S). Inreturngettingrewards(R) foreachactionwetake.Ouractionscanleadtoapositive rewardornegativereward. Thesetofactionswetookdefineourpolicy(π) andtherewardswegetinreturndefines ourvalue(V).Ourtaskhereistomaximizeourrewardsbychoosingthecorrectpolicy.
Agoodexampleis gameplayingwhereasinglemovebyitselfisnotthatimportant;itisthe sequenceofrightmovesthatisgood.Amoveisgoodifitispartofagoodgameplaying policy.
Arobotnavigatinginanenvironmentinsearchofagoallocationisanotherapplicationarea ofreinforcementlearning. Atany time, the robotcan move in one ofa numberof directions.Afteranumberoftrialruns,itshouldlearnthecorrectsequenceofactionsto reachtothegoalstatefromaninitialstate,doingthisasquicklyaspossibleandwithout hittinganyoftheobstacles.
PreparedByAbinPhilip,AsstProf,TocH.
Otherexamplesinclude
● AdaptiveTrafficsignaloptimization ● Adaptivepowergriddistribution
HowdowelearnaClassfrompositiveandnegativeexamples WhatdoyoumeanbyLearningfromaclassofexamples ExplainInputRepresentationwithanexample ExplainTrainingsetwithanexample WhatdoyoumeanbyhypothesisClass,howcanwesetanhypothesis Whatdoyoumeanbyempiricalerror,explainwithanexample ExplainthecasesofGeneralizationandSpecializedHypothesiswithexample. ExplainconceptofVersionSpacewithexample WhyisitconsideredbesttochooseaMargininbetweenoftheVersionSpace ➢ Howdoesdoubtsarisewhenlabellingdatasamples
LearningfromaClassofExamples Supposewewanttolearnthe class,C,ofa“familycar.” Wehaveasetofexamplesof cars,andwehaveagroupofpeoplethatwesurveytowhom weshow thesecars.The peoplelookatthecarsandlabelthem;thecarsthattheybelievearefamilycarsare positiveexamples,andtheothercarsare negativeexamples.
Classlearningisfindingadescriptionthatissharedbyallpositiveexamplesandnoneofthe negativeexamples.Doingthis,wecanmakeaprediction:Givenacarthatwehavenotseen before,bycheckingwiththedescriptionlearned,wewillbeabletosaywhetheritisa familycarornot.
Afterananalysisexpertsreachconclusionthatamongallfeaturesacarmayhave,the featuresthatseparateafamilycarfromothercarsarethepriceandenginepower.These twoattributesarethe inputstotheclassrecognizer.Notethatwhenwedecideonthis particular input representation, we are ignoring various other attributes as irrelevant. (other featureslikeseatingcapacity,mileageetchavebeen ignoredforsimplicity)
Training set for the class of a “family car.”(left)
Eachdatapointcorrespondstooneexamplecar, andthecoordinatesofthepointindicatetheprice and engine power of that car. ‘+’ denotes a positiveexampleoftheclass(afamilycar),and ‘−’denotesanegativeexample(notafamilycar);
Wecandenotepriceasthefirstinputattribute x 1 andenginepowerasthesecondattribute x 2 (e.g.,
PreparedByAbinPhilip,AsstProf,TocH.
InreallifewedonotknowC (x),sowecannotevaluatehowwell h(x) matchesC (x). WhatwehaveisthetrainingsetX,whichisasmallsubsetofthesetofallpossible x.
The empiricalerroristheproportionoftraininginstanceswhere predictionsof hdonot match the required valuesgiven in X.(thatisforexamplewhen afamilycarisnot identifiedasafamilycarbythehypothesis). Theerrorofhypothesis hgiventhetrainingsetXis
,where1 (a= b) is1if a= bandis0if a= b Inourfamilycarexample,thehypothesisclassHisthesetofallpossiblerectangles.Each
quadruple (p 1
h ,p 2
h ,e 1
h ,e 2
h ) definesonehypothesis, h,from H ,andweneedto
choose the bestone, orin otherwords, we need to find the valuesofthese four parametersgiventhetrainingset,toincludeallthepositiveexamplesandnoneofthe negativeexamples.Thereareinfinitelymanysuch hforwhichthisissatisfied,namely,for whichtheerror, E,is 0 ,
Butgivenafutureexamplesomewhereclosetotheboundarybetweenpositiveandnegative examples,differentcandidatehypothesesmaymakedifferentpredictions.Thisistheproblem of generalization—thatis,how wellourhypothesiswillcorrectlyclassifyfutureexamples thatarenotpartofthetrainingset.
Mostspecifichypothesis, S,thatisthetightest rectangle thatincludesallthe positive examples andnoneofthenegativeexamplesThisgivesus one hypothesis, h = S, asourinduced class. NotethattheactualclassCmaybelargerthan S butisneversmaller.The mostgeneralhypothesis, G, is the largest rectangle we can draw that includesallthepositiveexamplesandnoneofthe negativeexamplesAny h∈ Hbetween Sand Gis a valid hypothesis with no error, said to be consistentwiththetrainingset,andsuch hmake
DependingonXandH,theremaybeseveral Si and Gjwhichrespectivelymakeupthe S-setandthe G-set.Everymemberofthe S-setis consistentwithalltheinstances,andtherearenoconsistenthypothesesthataremore specific.Similarly,everymemberofthe G-setisconsistentwithalltheinstances,and therearenoconsistenthypothesesthataremoregeneral.Thesetwomakeuptheboundary setsandanyhypothesisbetweenthemisconsistentandispartoftheversionspace.
GivenX, wecanfind S, or G, orany hfrom theversionspaceanduseitasour hypothesis, h.Itseemsagoodoptiontochoose hhalfwaybetween Sand G;thisisto increasethe margin,whichisthedistancebetweentheboundaryandtheinstancesclosest toit.Forourerrorfunctiontohaveaminimumat hwiththemaximummargin,weshould useanerror(loss) functionwhichnotonlycheckswhetheraninstanceisonthecorrect sideoftheboundarybutalsohowfarawayitis.Insteadof h(x) thatreturns0/ 1 ,we needtohaveahypothesisthatreturnsavaluewhichcarriesameasureofthedistanceto
PreparedByAbinPhilip,AsstProf,TocH.
theboundaryandweneedtohavealossfunction whichusesit,differentfrom theonethatchecks forequality( 1 or 0 ).
IfaninstancefallsbetweenSandGweconsiderit tobeadoubt,iewecannotlabelitwithcertainty.
Thuswecansummarize,amodelforlearningconsist of TofindClassC,saya“familycar” ● Prediction: Iscar xafamily car? ● Knowledgeextraction:Whatdo peopleexpectfromafamilycar? ● Output:TrainingSet Positive(+) andnegative(–) examplesoffamilycars ● Inputrepresentation: x 1 :price, x 2 :enginepower ● Hypothesishwiththelargestmargin(bestseparation) inVersionSpaceiewhichhas theleasterror.
ExplaintheconceptofVCdimensionswithexample WhenisanhypothesissaidtoshatterNpoints HowcanwedeterminetheVCdimensionofaHypothesis HowcanwemeasurethecapacityofaHypothesis ShowthattheVCdimensionofhypothesisbeingrectangleisfourandthatofaline isthree. Justify,canatheVCdimensionofarectangleclassbegreaterthanfour.
Ifwehaveadatasetcontaining N points.These N pointscanbelabeledin
N waysas
positiveandnegative.Therefore, 2
N differentlearningproblemscanbedefinedby Ndata
points.
Ifforanyoftheseproblems,wecanfindahypothesis h∈ Hthatseparatesthepositive examplesfromthenegative,thenwesayH shattersNpoints.
Thatis,anylearningproblemdefinableby N examplescanbelearnedwithnoerrorbya hypothesisdrawnfromH.ThemaximumnumberofpointsthatcanbeshatteredbyHis calledthe Vapnik-Chervonenkis(VC) dimensionofH,isdenotedas VC(H ),andmeasures the capacityofH
Exampleshowing3pointslabelledin ways( 2
N ways). Considerblack circlesasoneclassandwhitecirclesas another.
Assuming the hypothesis to be a separating line, it can separate the
PreparedByAbinPhilip,AsstProf,TocH.
PreparedByAbinPhilip,AsstProf,TocH.