1 Basic Principles of Data Wrangling 1 Akshay Singh, Surender Singh and Jyotsna Rathee 1.1 Introduction 2 1.2 Data Workflow Structure 4 1.3 Raw Data Stage 4 1.3.1 Data Input 5 1.3.2 Output Actions at Raw Data Stage 6 1.3.3 Structure 6 1.3.4 Granularity 7 1.3.5 Accuracy 7 1.3.6 Temporality 8 1.3.7 Scope 8 1.4 Refined Stage 9 1.4.1 Data Design and Preparation 9 1.4.2 Structure Issues 9 1.4.3 Granularity Issues 10 1.4.4 Accuracy Issues 10 1.4.5 Scope Issues 11 1.4.6 Output Actions at Refined Stage 11 1.5 Produced Stage 12 1.5.1 Data Optimization 13 1.5.2 Output Actions at Produced Stage 13 1.6 Steps of Data Wrangling 14 1.7 Do’s for Data Wrangling 16 1.8 Tools for Data Wrangling 16 References 17 2 Skills and Responsibilities of Data Wrangler 19 Prabhjot Kaur, Anupama Kaushik and Aditya Kapoor 2.1 Introduction 20 2.2 Role as an Administrator (Data and Database) 21 2.3 Skills Required 22 2.3.1 Technical Skills 22 2.3.1.1 Python 22 2.3.1.2 R Programming Language 25 2.3.1.3 Sql 26 2.3.1.4 MATLAB 27 2.3.1.5 Scala 27 2.3.1.6 Excel 28 2.3.1.7 Tableau 28 2.3.1.8 Power BI 29 2.3.2 Soft Skills 31 2.3.2.1 Presentation Skills 31 2.3.2.2 Storytelling 32 2.3.2.3 Business Insights 32 2.3.2.4 Writing/Publishing Skills 32 2.3.2.5 Listening 33 2.3.2.6 Stop and Think 33 2.3.2.7 Soft Issues 33 2.4 Responsibilities as Database Administrator 34 2.4.1 Software Installation and Maintenance 34 2.4.2 Data Extraction, Transformation, and Loading 34 2.4.3 Data Handling 35 2.4.4 Data Security 35 2.4.5 Data Authentication 35 2.4.6 Data Backup and Recovery 35 2.4.7 Security and Performance Monitoring 36 2.4.8 Effective Use of Human Resource 36 2.4.9 Capacity Planning 36 2.4.10 Troubleshooting 36 2.4.11 Database Tuning 36 2.5 Concerns for a DBA 37 2.6 Data Mishandling and Its Consequences 39 2.6.1 Phases of Data Breaching 40 2.6.2 Data Breach Laws 41 2.6.3 Best Practices For Enterprises 41 2.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation 42 2.8 Solution to the Problem 42 2.9 Case Studies 42 2.9.1 UBER Case Study 42 2.9.1.1 Role of Analytics and Business Intelligence in Optimization 44 2.9.1.2 Mapping Applications for City Ops Teams 46 2.9.1.3 Marketplace Forecasting 47 2.9.1.4 Learnings from Data 48 2.9.2 PepsiCo Case Study 48 2.9.2.1 Searching for a Single Source of Truth 49 2.9.2.2 Finding the Right Solution for Better Data 49 2.9.2.3 Enabling Powerful Results with Self-Service Analytics 50 2.10 Conclusion 50 References 50 3 Data Wrangling Dynamics 53 Simarjit Kaur, Anju Bala and Anupam Garg 3.1 Introduction 53 3.2 Related Work 54 3.3 Challenges: Data Wrangling 55 3.4 Data Wrangling Architecture 56 3.4.1 Data Sources 57 3.4.2 Auxiliary Data 57 3.4.3 Data Extraction 58 3.4.4 Data Wrangling 58 3.4.4.1 Data Accessing 58 3.4.4.2 Data Structuring 58 3.4.4.3 Data Cleaning 58 3.4.4.4 Data Enriching 59 3.4.4.5 Data Validation 59 3.4.4.6 Data Publication 59 3.5 Data Wrangling Tools 59 3.5.1 Excel 59 3.5.2 Altair Monarch 60 3.5.3 Anzo 60 3.5.4 Tabula 61 3.5.5 Trifacta 61 3.5.6 Datameer 63 3.5.7 Paxata 63 3.5.8 Talend 65 3.6 Data Wrangling Application Areas 65 3.7 Future Directions and Conclusion 67 References 68 4 Essentials of Data Wrangling 71 Menal Dahiya, Nikita Malik and Sakshi Rana 4.1 Introduction 71 4.2 Holistic Workflow Framework for Data Projects 72 4.2.1 Raw Stage 73 4.2.2 Refined Stage 74 4.2.3 Production Stage 74 4.3 The Actions in Holistic Workflow Framework 74 4.3.1 Raw Data Stage Actions 74 4.3.1.1 Data Ingestion 75 4.3.1.2 Creating Metadata 75 4.3.2 Refined Data Stage Actions 76 4.3.3 Production Data Stage Actions 77 4.4 Transformation Tasks Involved in Data Wrangling 78 4.4.1 Structuring 78 4.4.2 Enriching 78 4.4.3 Cleansing 79 4.5 Description of Two Types of Core Profiling 79 4.5.1 Individual Values Profiling 80 4.5.1.1 Syntactic 80 4.5.1.2 Semantic 80 4.5.2 Set-Based Profiling 80 4.6 Case Study 80 4.6.1 Importing Required Libraries 81 4.6.2 Changing the Order of the Columns in the Dataset 82 4.6.3 To Display the DataFrame (Top 10 Rows) and Verify that the Columns are in Order 82 4.6.4 To Display the DataFrame (Bottom 10 rows) and Verify that the Columns Are in Order 83 4.6.5 Generate the Statistical Summary of the DataFrame for All the Columns 83 4.7 Quantitative Analysis 84 4.7.1 Maximum Number of Fires on Any Given Day 84 4.7.2 Total Number of Fires for the Entire Duration for Every State 85 4.7.3 Summary Statistics 86 4.8 Graphical Representation 86 4.8.1 Line Graph 86 4.8.2 Pie Chart 86 4.8.3 Bar Graph 87 4.9 Conclusion 89 References 90 5 Data Leakage and Data Wrangling in Machine Learning for Medical Treatment 91 P.T. Jamuna Devi and B.R. Kavitha 5.1 Introduction 91 5.2 Data Wrangling and Data Leakage 93 5.3 Data Wrangling Stages 94 5.3.1 Discovery 94 5.3.2 Structuring 95 5.3.3 Cleaning 95 5.3.4 Improving 95 5.3.5 Validating 95 5.3.6 Publishing 95 5.4 Significance of Data Wrangling 96 5.5 Data Wrangling Examples 96 5.6 Data Wrangling Tools for Python 96 5.7 Data Wrangling Tools and Methods 99 5.8 Use of Data Preprocessing 100 5.9 Use of Data Wrangling 101 5.10 Data Wrangling in Machine Learning 104 5.11 Enhancement of Express Analytics Using Data Wrangling Process 106 5.12 Conclusion 106 References 106 6 Importance of Data Wrangling in Industry 4.0 109 Rachna Jain, Geetika Dhand, Kavita Sheoran and Nisha Aggarwal 6.1 Introduction 110 6.1.1 Data Wrangling Entails 110 6.2 Steps in Data Wrangling 111 6.2.1 Obstacles Surrounding Data Wrangling 113 6.3 Data Wrangling Goals 114 6.4 Tools and Techniques of Data Wrangling 115 6.4.1 Basic Data Munging Tools 115 6.4.2 Data Wrangling in Python 115 6.4.3 Data Wrangling in R 116 6.5 Ways for Effective Data Wrangling 116 6.5.1 Ways to Enhance Data Wrangling Pace 117 6.6 Future Directions 119 References 120 7 Managing Data Structure in R 123 Mittal Desai and Chetan Dudhagara 7.1 Introduction to Data Structure 123 7.2 Homogeneous Data Structures 125 7.2.1 Vector 125 7.2.2 Factor 131 7.2.3 Matrix 132 7.2.4 Array 136 7.3 Heterogeneous Data Structures 138 7.3.1 List 139 7.3.2 Dataframe 144 References 146 8 Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review 147 Pooja Kherwa, Jyoti Khurana, Rahul Budhraj, Sakshi Gill, Shreyansh Sharma and Sonia Rathee 8.1 Introduction 148 8.2 Application Based Literature Review 150 8.3 Dimensionality Reduction Techniques 158 8.3.1 Principal Component Analysis 158 8.3.2 Linear Discriminant Analysis 161 8.3.2.1 Two-Class LDA 162 8.3.2.2 Three-Class LDA 162 8.3.3 Kernel Principal Component Analysis 165 8.3.4 Locally Linear Embedding 169 8.3.5 Independent Component Analysis 171 8.3.6 Isometric Mapping (Isomap) 172 8.3.7 Self-Organising Maps 173 8.3.8 Singular Value Decomposition 174 8.3.9 Factor Analysis 175 8.3.10 Auto-Encoders 176 8.4 Experimental Analysis 178 8.4.1 Datasets Used 178 8.4.2 Techniques Used 178 8.4.3 Classifiers Used 179 8.4.4 Observations 179 8.4.5 Results Analysis Red-Wine Quality Dataset 179 8.5 Conclusion 182 References 182 9 Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence 187 Prashant Vats and Siddhartha Sankar Biswas 9.1 Introduction 188 9.2 The Internet of Things and Big Data Correlation 190 9.3 Design, Structure, and Techniques for Big Data Technology 191 9.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools 193 9.4.1 From Information to Guidance 194 9.4.2 The Transition from Information Management to Valuation Offerings 195 9.5 Big Data Applications in the Commercial Surroundings 196 9.5.1 IoT and Data Science Applications in the Production Industry 197 9.5.1.1 Devices that are Inter Linked 199 9.5.1.2 Data Transformation 199 9.5.2 Predictive Analysis for Corporate Enterprise Applications in the Industrial Sector 204 9.6 Big Data Insights’ Constraints 207 9.6.1 Technological Developments 207 9.6.2 Representation of Data 207 9.6.3 Data That Is Fragmented and Imprecise 208 9.6.4 Extensibility 208 9.6.5 Implementation in Real Time Scenarios 208 9.7 Conclusion 209 References 210 10 Generative Adversarial Networks: A Comprehensive Review 213 Jyoti Arora, Meena Tushir, Pooja Kherwa and Sonia Rathee List of Abbreviations 213 10.1 Introductıon 214 10.2 Background 215 10.2.1 Supervised vs Unsupervised Learning 215 10.2.2 Generative Modeling vs Discriminative Modeling 216 10.3 Anatomy of a GAN 217 10.4 Types of GANs 218 10.4.1 Conditional GAN (CGAN) 218 10.4.2 Deep Convolutional GAN (DCGAN) 220 10.4.3 Wasserstein GAN (WGAN) 221 10.4.4 Stack GAN 222 10.4.5 Least Square GAN (LSGANs) 222 10.4.6 Information Maximizing GAN (INFOGAN) 223 10.5 Shortcomings of GANs 224 10.6 Areas of Application 226 10.6.1 Image 226 10.6.2 Video 226 10.6.3 Artwork 227 10.6.4 Music 227 10.6.5 Medicine 227 10.6.6 Security 227 10.7 Conclusion 228 References 228 11 Analysis of Machine Learning Frameworks Used in Image Processing: A Review 235 Gurpreet Kaur and Kamaljit Singh Saini 11.1 Introduction 235 11.2 Types of ML Algorithms 236 11.2.1 Supervised Learning 236 11.2.2 Unsupervised Learning 237 11.2.3 Reinforcement Learning 238 11.3 Applications of Machine Learning Techniques 238 11.3.1 Personal Assistants 238 11.3.2 Predictions 238 11.3.3 Social Media 240 11.3.4 Fraud Detection 240 11.3.5 Google Translator 242 11.3.6 Product Recommendations 242 11.3.7 Videos Surveillance 243 11.4 Solution to a Problem Using ml 243 11.4.1 Classification Algorithms 243 11.4.2 Anomaly Detection Algorithm 244 11.4.3 Regression Algorithm 244 11.4.4 Clustering Algorithms 245 11.4.5 Reinforcement Algorithms 245 11.5 ml in Image Processing 246 11.5.1 Frameworks and Libraries Used for ML Image Processing 246 11.6 Conclusion 248 References 248 12 Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges 251 Ram Singh, Rohit Bansal and Niranjanamurthy M. 12.1 Introduction 252 12.1.1 Artificial Intelligence in Accounting and Finance Sector 252 12.2 Uses of AI in Accounting & Finance Sector 254 12.2.1 Pay and Receive Processing 254 12.2.2 Supplier on Boarding and Procurement 255 12.2.3 Audits 255 12.2.4 Monthly, Quarterly Cash Flows, and Expense Management 255 12.2.5 AI Chatbots 255 12.3 Applications of AI in Accounting and Finance Sector 256 12.3.1 AI in Personal Finance 257 12.3.2 AI in Consumer Finance 257 12.3.3 AI in Corporate Finance 257 12.4 Benefits and Advantages of AI in Accounting and Finance 258 12.4.1 Changing the Human Mindset 259 12.4.2 Machines Imitate the Human Brain 260 12.4.3 Fighting Misrepresentation 260 12.4.4 AI Machines Make Accounting Tasks Easier 260 12.4.5 Invisible Accounting 261 12.4.6 Build Trust through Better Financial Protection and Control 261 12.4.7 Active Insights Help Drive Better Decisions 261 12.4.8 Fraud Protection, Auditing, and Compliance 262 12.4.9 Machines as Financial Guardians 263 12.4.10 Intelligent Investments 264 12.4.11 Consider the “Runaway Effect” 264 12.4.12 Artificial Control and Effective Fiduciaries 264 12.4.13 Accounting Automation Avenues and Investment Management 265 12.5 Challenges of AI Application in Accounting and Finance 265 12.5.1 Data Quality and Management 267 12.5.2 Cyber and Data Privacy 267 12.5.3 Legal Risks, Liability, and Culture Transformation 267 12.5.4 Practical Challenges 268 12.5.5 Limits of Machine Learning and AI 269 12.5.6 Roles and Skills 269 12.5.7 Institutional Issues 270 12.6 Suggestions and Recommendation 271 12.7 Conclusion and Future Scope of the Study 272 References 272 13 Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car 275 B. Eshwar, Harshaditya Sheoran, Shivansh Pathak and Meena Rao 13.1 Introduction 275 13.1.1 Environment Overview 277 13.1.1.1 Simulation Overview 277 13.1.1.2 Agent Overview 278 13.1.1.3 Brain Overview 279 13.1.2 Algorithm Used 279 13.1.2.1 Markovs Decision Process (MDP) 279 13.1.2.2 Adding a Living Penalty 280 13.1.2.3 Implementing a Neural Network 280 13.2 Simulations and Results 281 13.2.1 Self-Driving Car Simulation 281 13.2.2 Real-Time Lane Detection and Obstacle Avoidance 283 13.2.3 About the Model 283 13.2.4 Preprocessing the Image/Frame 285 13.3 Conclusion 286 References 287 14 Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited 289 Ruchika Pharswan, Ashish Negi and Tridib Basak 14.1 Introduction 290 14.2 Literature Review 292 14.2.1 Prior Pandemic Automobile Industry/COVID- 19 Thump on the Automobile Sector 294 14.2.2 Maruti Suzuki India Limited (MSIL) During COVID-19 and Other Players in the Automobile Industry and How MSIL Prevailed 296 14.3 Methodology 297 14.4 Findings 298 14.4.1 Worldwide Economic Impact of the Epidemic 298 14.4.2 Effect on Global Automobile Industry 298 14.4.3 Effect on Indian Automobile Industry 301 14.4.4 Automobile Industry Scenario That Can Be Expected Post COVID-19 Recovery 306 14.5 Discussion 306 14.5.1 Competitive Dimensions 306 14.5.2 MSIL Strategies 307 14.5.3 MSIL Operations and Supply Chain Management 308 14.5.4 MSIL Suppliers Network 309 14.5.5 MSIL Manufacturing 310 14.5.5 MSIL Distributors Network 311 14.5.6 MSIL Logistics Management 312 14.6 Conclusion 312 References 312 About the Editors 315 Index 317
Les mer