Applying Data Science to Daily Life: Battle of the Neighborhoods

Data science skills need not remain restricted to projects at work. Data can be found all around us, and when used effectively, data can help us answer questions, make decisions, and solve problems in our daily lives. One situation that may benefit from data-driven decision-making is choosing a neighborhood when relocating. In this post, we …

Continue reading Applying Data Science to Daily Life: Battle of the Neighborhoods

Sentiment Analysis: IMDB movie review classification using Natural Language Toolkit (NLTK) and Random Forest Classifier

Sentiment analysis, sometimes also known as emotion AI or opinion mining, refers to the use of natural language processing (NLP) to systematically identify, extract, and quantify subjective information. The ability to mine information from textual data can provide valuable details that can be critical to businesses. Say for instance, customers of Amazon often leave a …

Continue reading Sentiment Analysis: IMDB movie review classification using Natural Language Toolkit (NLTK) and Random Forest Classifier

Stock Price Forecasting (NYSE: IBM) with Deep Learning using Multilayer Perceptron (MLP)

This article illustrates the use of Deep Learning, and more specifically the multilayer perceptron (MLP) for the purpose of forecasting stock prices for IBM. In a previous article, we've demonstrated the use of traditional time-series analysis using the ARIMA model. The results were fascinating, as we saw that the normalized mean squared error was in …

Continue reading Stock Price Forecasting (NYSE: IBM) with Deep Learning using Multilayer Perceptron (MLP)

Stock Price Forecasting (NASDAQ: AMZN) with Time Series Analysis using ARIMA Model

Time series analysis is of great importance in quantitative financial analysis where the ability to accurately forecast stock prices using past stock prices allows investment companies to know when to buy, sell, or hold on to stocks. In today's article, we will explain and demonstrate with real world financial data, attempting to forecast the open …

Continue reading Stock Price Forecasting (NASDAQ: AMZN) with Time Series Analysis using ARIMA Model

SQL Querying with Joins, Set Theory Clauses, and Subqueries

In this article, we introduce SQL querying techniques that are more applicable in real-life databases. In a realistic data environment, a data scientist will likely encounter scenarios where one has to retrieve data from multiple sources before analyzing them with statistical methods. In this case, techniques such as joining columns of information from different databases, …

Continue reading SQL Querying with Joins, Set Theory Clauses, and Subqueries

Convolutional Neutral Networks with Keras for Multiclass Image Classification Problems

Convolutional neural network (CNN) is a class of deep learning neutral network which is most popularly used for analyzing images. The term convolution is a mathematical operation on two functions, creating a third function which expresses how each of the input functions shape each other. Similarly, in CNNs a chosen kernel convolves with the data …

Continue reading Convolutional Neutral Networks with Keras for Multiclass Image Classification Problems

Deep Learning Neural Networks with Keras for Classification Problems

Deep learning is part of a broader spectrum of machine learning methods known as artificial neural networks. It can be trained on both supervised and unsupervised data, and has proven to be an effective modeling tool in both linear and non-linear contexts. A widely used deep learning library is Keras — a high-level neural networks …

Continue reading Deep Learning Neural Networks with Keras for Classification Problems

Types of Machine Learning Performance Metric and When To Use Them

This article discusses the most commonly used performance metrics in classification and regression machine learning problems, in the hope that users will be familiarized with how the metrics are calculated, what the calculated metrics mean, and when is that metric useful. It is critical to understand that no single metric can validate the result entirely. …

Continue reading Types of Machine Learning Performance Metric and When To Use Them

Balanced Bootstrapping with Random Forest for Imbalanced Data Sets

The curse of imbalanced data refers to machine learning models trained to predict outcomes which are majority, and neglecting the minority. This is a common problem in fraud detection, cancer classification, etc. In those examples, although the occurrence of the positive outcome is rare, it is highly crucial that the machine learning model is able …

Continue reading Balanced Bootstrapping with Random Forest for Imbalanced Data Sets