Most data sets which are readily available, have been cleaned for pedagogical purposes. In this post, we present a case study with a realistic touch. This simulated data set has 40,000 entries and 100 predictors, riddled with spelling mistakes, missing entries, etc to mimic real-life data faced by data scientists. This exercise will give us …