ADVANCES in DATA SCIENCE and ANALYTICS Presenting the concepts and advances of data science and analytics, this volume, written and edited by a global team of experts, also goes into the practical applications that can be utilized across multiple disciplines and industries, for both the engineer and the student, focusing on machining learning, big data, business intelligence, and analytics. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data. Data analytics software is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries. For the purposes of this volume, data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. While a data scientist is expected to forecast the future based on past patterns, data analysts extract meaningful insights from various data sources. Although data mining and other related areas have been around for a few decades, data science and analytics are still quickly evolving, and the processes and technologies change, almost on a day-to-day basis. This volume provides an overview of some of the most important advances in these areas today, including practical coverage of the daily applications. Valuable as a learning tool for beginners in this area as well as a daily reference for engineers and scientists working in these areas, this is a must-have for any library.
Les mer
Preface xv 1 Implementation Tools for Generating Statistical Consequence Using Data Visualization Techniques 1 Dr. Ajay B. Gadicha, Dr. Vijay B. Gadicha, Prof. Sneha Bohra and Dr. Niranjanamurthy M. 1.1 Introduction 2 1.2 Literature Review 4 1.3 Tools in Data Visualization 4 1.4 Methodology 14 1.4.1 Plotting the Data 14 1.4.2 Plotting the Model on Data 15 1.4.3 Quantifying Linear Relationships 16 1.4.4 Covariance vs. Correlation 17 1.5 Conclusion 18 References 18 2 Decision Making and Predictive Analysis for Real Time Data 21 Umesh Pratap Singh 2.1 Introduction 22 2.2 Data Analytics 23 2.2.1 Descriptive Analytics 23 2.2.2 Diagnostic Analytics 23 2.2.3 Predictive Analytics 23 2.2.4 Prescriptive Analytics 24 2.3 Predictive Modeling 24 2.4 Categories of Predictive Models 24 2.5 Process of Predictive Modeling 25 2.5.1 Requirement Gathering 26 2.5.2 Data Gathering 26 2.5.3 Data Analysis and Massaging 26 2.5.4 Machine Learning Statistics 26 2.5.5 Predictive Modeling 26 2.5.6 Prediction and Decision Making 27 2.6 Predictive Analytics Opportunities 27 2.6.1 Detecting Fraud 27 2.6.2 Reduction of Risk 27 2.6.3 Marketing Campaign Optimization 28 2.6.4 Operation Improvement 28 2.6.5 Clinical Decision Support System 28 2.7 Classification of Predictive Analytics Models 28 2.7.1 Predictive Models 28 2.7.2 Descriptive Models 29 2.7.3 Decision Models 29 2.8 Predictive Analytics Techniques 29 2.8.1 Predictive Analytics Software 29 2.8.2 The Importance of Good Data 30 2.8.3 Predictive Analytics vs. Business Intelligence 30 2.8.4 Pricing Information 30 2.9 Data Analysis Tools 30 2.9.1 Excel 30 2.9.2 Tableau 31 2.9.3 Power BI 31 2.9.4 Fine Report 31 2.9.5 R & Python 31 2.10 Advantages & Disadvantages of Predictive Modeling 31 2.10.1 Advantages 31 2.10.2 Disadvantages 32 2.10.2.1 Data Labeling 32 2.10.2.2 Obtaining Massive Training Datasets 32 2.10.2.3 The Explainability Problem 32 2.10.2.4 Generalizability of Learning 33 2.10.2.5 Bias in Algorithms and Data 33 2.11 Predictive Analytics Biggest Impact 33 2.11.1 Predicting Demand 33 2.11.2 Transformation Using Technology and Process 34 2.11.3 Improved Pricing 34 2.11.4 Predictive Maintenance 35 2.12 Application of Predictive Analytics 35 2.12.1 Financial and Banking Services 35 2.12.2 Retail 35 2.12.3 Health and Insurance 36 2.12.4 Oil and Gas Utilities 36 2.12.5 Public Sector 36 2.13 Future Scope of Predictive Modeling 36 2.13.1 Technological Advancements 37 2.13.2 Changes in Work 37 2.13.3 Risk Mitigation 37 2.14 Conclusion 37 References 38 3 Optimizing Water Quality with Data Analytics and Machine Learning 39 Bin Liang, Zhidong Li, Hongda Tian, Shuming Liang, Yang Wang and Fang Chen 3.1 Introduction 39 3.2 Related Work 41 3.3 Data Sources and Collection 42 3.4 Water Demand Forecasting 43 3.4.1 Network Flow and Zone Demand Estimation 43 3.4.2 Demand Forecasting 44 3.4.2.1 Feature Importance 45 3.4.2.2 Forecast Horizon 46 3.4.3 Performance Characterization 46 3.5 Re-Chlorination Optimization 49 3.5.1 Data 51 3.5.2 Water Age Estimation 52 3.5.2.1 Travel Time Estimation 53 3.5.2.2 Residential Time Estimation 54 3.5.3 Ammonia Prediction 54 3.5.4 Optimization Model Definition 57 3.5.5 Improvements in Customer Water Quality 59 3.5.6 Plant Dosing Optimization 62 3.6 Conclusion 63 Acknowledgements 63 References 63 4 Lip Reading Framework using Deep Learning and Machine Learning 67 Hemant Kumar Gianey, Parth Khandelwal, Prakhar Goel, Rishav Maheshwari, Bhannu Galhotra and Divyanshu Pratap Singh 4.1 Introduction 68 4.1.1 Overview 68 4.1.2 Motivation 68 4.1.3 Lip Reading System Outcomes and Deliverables 69 4.2 The Emergence and Definition of the Lip-Reading System 70 4.2.1 Background of Domain 70 4.2.2 Identified Problems 78 4.2.3 Tools and Technologies Used 78 4.2.4 Implementation Aspects 78 4.2.4.1 Data Preparation 79 4.3 Design and Components of Lip-Reading System 82 4.4 Lip Reading System Architecture 82 4.5 Testing 84 4.6 Problems Encountered During Implementation 84 4.6.1 Assumptions and Constraints 85 4.7 Conclusion 85 4.8 Future Work 85 References 86 5 New Perspective to Management, Economic Growth and Debt Nexus Analysis: Evidence from Indian Economy 89 Edmund Ntom Udemba, Festus Victor Bekun, Dervis Kirikkaleli and Esra Sipahi Döngül 5.1 Introduction 90 5.2 Literature Review 92 5.2.1 External Debt and Economic Growth 92 5.2.2 Trade Openness, FDI, and Economic Growth 94 5.2.3 FDI and Economic Growth 94 5.3 Data 95 5.3.1 Analytical Framework and Data Description 96 5.3.2 Theoretical Background and Specifications 96 5.3.2.1 Model Specification 98 5.4 Methodology and Findings 99 5.4.1 Unit Root Testing 99 5.4.2 Cointegration 99 5.4.3 Vector Error Correction Model 103 5.4.4 Long-Run Relationship Estimation 105 5.4.5 Causality Test 107 5.5 Conclusion and Policy Implications 108 Declarations 109 Availability of Data and Materials 109 Competing Interests 110 Funding 110 Authors’ Contributions 110 Acknowledgments 110 References 110 6 Data-Driven Delay Analysis with Applications to Railway Networks 115 Boyu Li, Ting Guo, Yang Wang and Fang Chen 6.1 Introduction 116 6.2 Related Works 118 6.3 Background Knowledge 119 6.3.1 Background and Problem Formulation 120 6.3.1.1 Train Delay 120 6.3.1.2 Delay Propagation 121 6.3.2 Preliminaries 122 6.3.2.1 Bayesian Inference 123 6.3.2.2 Markov Property 123 6.4 Delay Propagation Model 123 6.4.1 Conditional Bayesian Delay Propagation 123 6.4.1.1 Delay Self-Propagation 124 6.4.1.2 Incremental Run-Time Delay 125 6.4.1.3 Incremental Dwell Time Delay 125 6.4.1.4 Accumulative Departure Delay 126 6.4.2 Cross-Line Propagation, Backward Propagation and Train Connection Propagation 127 6.5 Primary Delay Tracing Back 130 6.5.1 Delay Candidates Selection 130 6.5.2 Relation Construction 131 6.5.2.1 Preceding and Following Trains 131 6.5.2.2 Preceding and Connecting Trains 131 6.6 Evaluation on Dwell Time Improvement Strategy 132 6.7 Experiments 135 6.7.1 Experiment Setting 135 6.7.2 Temporal Prediction of Delay Propagation 137 6.7.3 Spatial Prediction of Delay Propagation 138 6.7.4 Case Study of Primary Delay Tracing Down 139 6.7.5 Evaluation of Dwell Time Improvement Strategy 140 6.8 Conclusion 142 References 142 7 Proposing a Framework to Analyze Breast Cancer in Mammogram Images Using Global Thresholding, Gray Level Co-Occurrence Matrix, and Convolutional Neural Network (CNN) 145 Ms. Tanishka Dixit and Ms. Namrata Singh 7.1 Introduction & Purpose of Study 146 7.1.1 Segmentation 146 7.1.1.1 Types of Segmentation 147 7.1.2 Compression 150 7.2 Literature Review & Motivation 153 7.3 Proposed Work 161 7.3.1 Algorithm 161 7.3.2 Explanation 162 7.3.3 Flowchart 162 7.4 Observation Tables and Figures 163 7.5 Conclusion 176 7.6 Future Work 176 References 176 8 IoT Technologies for Smart Healthcare 181 Rehab A. Rayan, Imran Zafar and Christos Tsagkaris 8.1 Introduction 182 8.2 Literature Review 183 8.2.1 IoT-Based Smart Health 183 8.2.2 Advantages of Applying IoT in Health 186 8.3 Findings 187 8.3.1 Significant Features and Applications of IoT in Health 187 8.3.1.1 Simultaneous Monitoring and Reporting 189 8.3.1.2 End-to-End Connectivity and Affordability 190 8.3.1.3 Data Analysis 190 8.3.1.4 Tracking, Alerts, and Remote Medical Care 190 8.3.1.5 Research 191 8.3.1.6 Patient-Generated Health Data (PGHD) 191 8.3.1.7 Management of Chronic Diseases and Preventative Care 191 8.3.1.8 Home-Based and Short-Term Care 192 8.4 Case Study: CyberMed as an IoT-Based Smart Health Model 192 8.5 Discussions 193 8.5.1 Limitations of Adopting IoT in Health 193 8.5.1.1 Data Security and Privacy 193 8.5.1.2 Connectivity 194 8.5.1.3 Compatibility and Data Integration 195 8.5.1.4 Implementation Cost 195 8.5.1.5 Complexity and Risk of Errors 195 8.6 Future Insights 196 8.7 Conclusions 197 References 197 9 Enhancement of Scalability of SVM Classifiers for Big Data 203 Vijaykumar Bhajantri, Shashikumar G. Totad and Geeta R. Bharamagoudar 9.1 Introduction 204 9.2 Support Vector Machine 205 9.2.1 Challenges 208 9.3 Parallel and Distributed Mechanism 209 9.3.1 Shared-Memory Parallelism 209 9.4 Distributed Big Data Architecture 210 9.4.1 Hadoop MapReduce 210 9.4.2 Spark 210 9.4.3 Akka 211 9.5 Distributed High Performance Computing 212 9.5.1 GASNet 212 9.5.2 Charm++ 213 9.6 GPU Based Parallelism 214 9.6.1 Cuda 215 9.6.2 OpenCL 215 9.7 Parallel and Distributed SVM Algorithms 217 9.7.1 Ls-svm 218 9.7.2 Cascade SVM 219 9.7.3 dc Svm 220 9.7.4 Parallel Distributed Multiclass SVM Algorithms 222 9.8 Conclusion and Future Research Directions 222 References 225 10 Electrical Network-Related Incident Prediction Based on Weather Factors 233 Hongda Tian, Jessie Nghiem and Fang Chen 10.1 Introduction 233 10.2 Related Work 235 10.3 Methodology 235 10.3.1 Binary Classification of Incident and Normality 235 10.3.2 Incident Categorization Using Natural Language Processing 236 10.3.3 Classification of Multiple Types of Incidents 236 10.4 Experiments 237 10.4.1 Data Sets 237 10.4.2 Evaluation Metrics 239 10.4.3 Binary Classification 239 10.4.4 Incident Categorization 241 10.4.5 Multi-Class Classification 242 10.5 Conclusion and Future Work 244 Acknowledgements 244 References 245 11 Green IoT: Environment-Friendly Approach to IoT 247 Abhishek Goel and Siddharth Gautam 11.1 Introduction 247 11.2 G-IoT (Green Internet of Things) 249 11.3 Layered Architecture of G-IoT 251 11.3.1 Data Center/Cloud 252 11.3.2 Data Analytics and Control Applications It 252 11.3.3 Data Aggregation and Storage 253 11.3.4 Edge Computing 253 11.3.5 Communication and Processing Unit 254 11.4 Techniques for Implementation of G-IoT 257 11.5 Power Saving Methods Based on Components 266 11.6 Applications of G-IoT 266 11.7 Challenges and Future Scope 269 11.8 Case Study 269 11.9 Conclusion 270 References 271 12 Big-Data Analytics: A New Paradigm Shift in Micro Finance Industry 275 Vinay Pal Singh, Rohit Bansal and Ram Singh 12.1 Introduction 276 12.2 Reality of Area and Transcendent Difficulties 276 12.2.1 Probable Overlending 278 12.2.2 Information Imbalance 278 12.2.3 Retreating Not-for-Profit Sector 278 12.2.4 Neighbourhood Pressure 279 12.3 Data Analytics in Microfinance 280 12.3.1 Types of Data Analytics Used in Microfinance 280 12.3.2 Use of Big Data in Microfinance Industry 281 12.3.3 Risk and Data Based Credit Decisions 282 12.3.4 Product Development and Selection 283 12.3.5 Product or Service Positioning 283 12.3.6 M-Commerce and E-Payments 283 12.3.7 Making Reliable Credit Decisions 284 12.3.8 Big Data-Driven Model Promises Psychometric Evaluations 284 12.3.9 Product Build-Up, Service Positioning, and Offering 284 12.4 Opportunities and Risks in Using Data Analytics 284 12.5 Risk in Utilizing Big Data 287 12.6 Conclusion 290 References 290 13 Big Data Storage and Analysis 293 Namrata Dhanda 13.1 Introduction 293 13.1.1 6 V’s of Big Data 294 13.1.2 Types of Data 295 13.1.3 Issues in Handling Big Data 297 13.2 Hadoop as a Solution to Challenges of Big Data 297 13.2.1 The Hadoop Ecosystem 298 13.2.2 Rack Awareness Policy in HDFS 307 13.3 In-Memory Storage and NoSQL 308 13.3.1 Key-Value Data Stores 309 13.3.2 Document Stores 309 13.3.3 Wide Column Stores 310 13.3.4 Graph Stores 310 13.3.5 Multi-Modal Databases 310 13.4 Advantages of NoSQL Database 310 13.5 Conclusion 311 References 311 14 A Framework for Analysing Social Media and Digital Data by Applying Machine Learning Techniques for Pandemic Management 313 Mutyala Sridevi 14.1 Introduction 314 14.2 Literature Review 314 14.3 Understanding Pandemic Analogous to a Disaster 317 14.4 Application of Machine Learning Techniques at Various Phases of Pandemic Management 318 14.4.1 Mitigation Phase 319 14.4.2 Preparedness Phase 320 14.4.3 Response Phase 321 14.4.4 Recovery Phase 321 14.5 Generalized Framework to Apply Machine Learning Techniques for Pandemic Management 322 14.6 Conclusion 324 References 324 About the Editors 327 Index 329
Les mer
Presenting the concepts and advances of data science and analytics, this volume, written and edited by a global team of experts, also goes into the practical applications that can be utilized across multiple disciplines and industries, for both the engineer and the student, focusing on machining learning, big data, business intelligence, and analytics. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning, and big data. Data analytics software is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries. For the purposes of this volume, data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. While a data scientist is expected to forecast the future based on past patterns, data analysts extract meaningful insights from various data sources. Although data mining and other related areas have been around for a few decades, data science and analytics are still quickly evolving, and the processes and technologies change, almost on a day-to-day basis. This volume provides an overview of some of the most important advances in these areas today, including practical coverage of the daily applications. Valuable as a learning tool for beginners in this area as well as a daily reference for engineers and scientists working in these areas, this is a must-have for any library.
Les mer

Produktdetaljer

ISBN
9781119791881
Publisert
2022-11-01
Utgiver
Vendor
Wiley-Scrivener
Vekt
794 gr
Aldersnivå
P, 06
Språk
Product language
Engelsk
Format
Product format
Innbundet
Antall sider
352

Om bidragsyterne

M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M. S. Ramaiah Institute of Technology, Bangalore, Karnataka, India. He earned his PhD in computer science at JJTU. He has over 13 years of teaching experience and two years of industry experience as a software engineer. He has published four books and 85 papers in technical journals and conferences. He has six patents to his credit and has won numerous awards.

Hemant Kumar Gianey, PhD, is a senior assistant professor in the Computer Science Department at Vellore Institute of Technology, AP, India. He also worked at Thapar Institute of Engineering and Technology, Patiala, Punjab, India and worked as a post-doctoral researcher in the Computer Science and Engineering Department at National Cheng Kung University in Taiwan. He has over 15 years of teaching and industry experience. He has conducted many workshops and has been a guest speaker in various universities. He has also published many research papers on in scientific and technical journals.

Amir H. Gandomi, PhD, is a professor of data science in the Department of Engineering and Information Technology, University of Technology Sydney. Before joining UTS, he was an assistant professor at the School of Business, Stevens Institute of Technology, NJ, and a distinguished research fellow at BEACON Center, Michigan State University. He has published over 150 journal papers and four books and collectively has been cited more than 14,000 times. He has been named as one of the world’s most influential scientific minds and a Highly Cited Researcher (top 1%) for three consecutive years, from 2017 to 2019. He has also served as associate editor, editor, and guest editor in several prestigious journals and has delivered several keynote talks. He is also part of a NASA technology cluster on Big Data, Artificial Intelligence, and Machine Learning.