






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Mobility analysis research summerized
Typology: Lecture notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Sequential Mobility Data Analysis Versus Parallel Mobility Data Analysis. Students Name Institutional Affiliation Module Director Date Signature
Abstract Due to the fast growing exponentially of the mobility sets of data, there are numerous computation issues in analysing and processing of the large data volumes in an effective manner. The research is focused on determining the applicability of parallel computing methods in enhancing the performance of mobility data analysis that is provided by the Bureau of Transportation Statistics (BTS). The study is aimed at determining the travel behaviour, perceiving the population mobility behaviour as well as creating predictive models on the frequency of travel depending on the distance covered during the trip. The two methods of sequential and parallel processing are deployed and parallel performance is tested using 10 and 20 processors. The findings reveal that parallel computing takes much shorter time than sequential methods although it does not compromise on analysis. Findings, also, show that majority of the travel activity is concentrated over short distances and higher frequencies of travel patterns are not very common. The paper finds that parallel computing is an essential part of the contemporary data science processing of big data.
and Ratti, 2010). The evidence is based on anonymised mobile device records of various providers, which guarantee high accuracy and representativeness and do not encroach on user privacy. The data is represented using dates, population numbers, and frequency of trips by varying ranges of distances. These variables allow examining the pattern of both time and space mobility (Herwin et., 2022). The key issue that this research study involves is that of efficiently processing such a wide data and retaining the accuracy and scalability especially when it comes to repeated analysis. Data Pre-Processing Pre-processing of data was done to make sure that the data was clean, consistent and fit to be analyzed. This was done based on the way missing values were handled where incomplete records were eliminated and all relevant data field properly for mates (Dean and Ghemawat, 2008). The date columns were converted to the relevant kinds of date times in order to enable proper analysis that was based on the time. As well, the data was grouped into weekly data to facilitate the analysis and enhance the computational efficiency. Selection of features was also done to give attention to the most significant variables such as number of trips, distance type and population mobility measures (Bureau of Transport ation Statistics, 2023). Such measures were done to make sure that the data set was organized in an efficient manner to be used in the analysis and modelling. Data Classification The classification of the data was made by the formation of separate distance categories in which there were short-distance, medium-distance, and long-distance travel. The classification
helped to understand the mobility patterns more clearly and be able to share the analysis more effectively (Dean and Ghemawat, 2008). The categorisation was applied in the form of logical conditions applied to the dataset, according to which each of the trips is assigned to a corresponding category according to its distance (Sevtsuk and Ratti, 2010). The method enhanced the data interpretability and formed a basis on further analysis and modelling exercises especially in determination of trends among various types of travels behaviours. Data Modelling A predictive model was created to determine how many times the people are going to travel in regards to the distance of a trip. A linear regression method was chosen because it is simple to use and it is efficient in the determination of relationships among variables (Dean and Ghemawat, 2008). The independent variable was the trip distance with the number of trips as the dependent variable. The correlation was plotted as a scatter diagram to visualise linearity between these variables and it was found that distance has moderate correlation with the frequency of travels. The analysis showed that the shorter the distance, the higher the frequency of traveling, implying that the majority of mobility activities are localised (Herwin et., 2022). Despite its insightful nature, the model has a weakness in that it assumes the linearity of the travel patterns, and this might not be a pure representation of the pattern. Model Evaluation Root Mean Square Error (RMSE) and R-squared (R 2 ) measures were used to determine the performance of the predictive model. RMSE gave an indication of the average predicti on error and R, which was used to give the percentage variance in the dependent variable that is
The travel behaviour analysis showed that a relatively high percentage of the population stayed at home every week, which meant that many people were not very mobile (Pappalardo et al., 2022). In the case of the travellers, the most common were short distances implying that most of the activities were done in the local places. The analysis went further to find that more than 10 million individuals made between 10 and 25 journeys on certain dates and that more than 10 million individuals made between 50 and 100 journeys on certain dates (Pappalardo et al., 2022). The comparison made by a scatter plot showed that the lower trip frequencies were more frequent and regular whereas the higher trip frequencies were less frequent. It can be assumed that extreme travel behaviour is not very common and can be caused by some external reasons, e.g. holidays or great events. The predictive modelling analysis proved that there was a correlation between the trip distance and the rate of travel but the strength of the relationship was moderate (Herwin et., 2022). The results of visualising travellers by distance categories, which strengthened the conclusion that distance short-primaries are the dominant mode of mobility, and that distance long-distances are relatively high. Data Visualisation The results were analyzed with the use of different methods of visualisation to make the results more interpretable. The weekly trends in mobility were displayed in line graphs whereas the frequency of the trips in various categories were compared by using scatter plots. The distribution of trips was shown in the form of bar charts to show the distribution of trips according to the ranges of distance. These visualisation tools gave clear and understandable outlooks of the data, which helped to better understand mobility trends and prove the conclusions
made during the analysis process (Pappalardo et al., 2022). It was also easy to communicate findings to a wider audience through the use of graphical representations. Discussion and Interpretation. The results of this study bring to the fore, a number of significant issues of mobility behaviour and data processing methods. It is mainly short distance travel meaning that majority of the individuals are involved in localised activities which is in accordance with the everyday behaviour patterns (Pedregosa et al., 2011). The fact that high-frequency travel is relatively infrequent indicates that it is an behaviour that is conditioned by a particular situation as opposed to being a general trend. The sequential and parallel processing comparison proves the high benefits of parallel processing in working with large datasets. Parallel computing boosts scalability and minimizes the execution time by replicating and performing the computational tasks to two or more processors (Pedregosa et al, 2011). But overhead costs and diminishing returns imply that care must be taken in optimisation when making parallel systems.The shortcomings of the predictive model also show the significance of choosing the right modelling techniques (Herwin et., 2022). Although linear regression is a good starting point, more advanced models might be needed to present complicated dependencies in the data. Conclusion The paper manages to illustrate how data science can be used to examine big data of mobility. Parallel computing is an efficient way of processing data and integrating it is very useful in the analysis of big data. The results indicate that travel behaviour is localised with majority of the trips being of short distance. Also, the predictive modelling method offers
Appendix