Implementing an entire Data Science (AI) solution.

Part 1 of a three-part series on how to implement an AI solution from data cleaning and visualization to building a Machine Learning model and deploying it on the web!

AI image

Data Science? AI? Machine Learning? These 3 buzz words being thrown around everywhere must be confusing a lot! This series is for those with no or little prior experience with the above three trending areas. By the end of this series, you should have a basic understanding of them and even have your own AI solution (A Machine Learning model that takes different house features and returns the estimated price of the house) up and running which you can even add to your portfolio! This series assumes you know at least the basics of python or at least programming. If you’re new to programming refer to this quick read before jumping into this article.

Artificial Intelligince

Machine Learning

Data Science

Setting up the execution environment

the above image pretty much describes the things you’ll normally use as a beginner but you can take a look at this article for a deeper dive into using Colab.

Explore and Analyze Data in python

Unsurprisingly, our project starts with exploring and analyzing data. The results of this analysis might form the basis of a report or a machine learning model, but it all begins with data. For this project, we’ll stick with the famous Boston House pricing Dataset where we’ll build a prediction model to calculate the median value of owner-occupied homes in 1000 USD’s.

Importing required libraries

import numpy as np               #for advanced list manipulations
import pandas as pd #for loading our data
import matplotlib.pyplot as plt #for visualizations

Loading our data

df = pd.read_csv(' https://raw.githubusercontent.com/Azariagmt/Implementing-an-entire-Data-Science-AI-solution/master/Data/Training_set_boston.csv?token=ANOIBQLX4ZBHXSUPYIWMMELANHH3O'
)

Pandas read_csv method is used to load data from text files or almost any type of data file we’ll be working with. It can also load data from the URL where our data(CSV file) is located so let us load the data to our colab environment, more specifically to the df variable.

The next step would be to take a peek at our data and try to get a sense of how it's like. Calling the head() method on our DataFrame like below gives us the first five elements in our dataset.

df.head()

If you’ve got the above output great job! Each column refers to :

CRIM: per capita crime rate by town

ZN: proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS: proportion of non-retail business acres per town

CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

NOX: nitric oxides concentration (parts per 10 million)

RM: average number of rooms per dwelling

AGE: proportion of owner-occupied units built prior to 1940

DIS: weighted distances to five Boston employment centres

RAD: index of accessibility to radial highways

TAX: full-value property-tax rate per 10,000 USD

PTRATIO: pupil-teacher ratio by town

B: 1000(Bk — 0.63)² where Bk is the proportion of blacks by town

LSTAT: lower status of the population (%)

MEDV: Median value of owner-occupied homes in 1000 USD’s (Target)

Handling missing values

df.isnull().sum()

calling the above methods on our DataFrame should return the sum of the null values for each column.

no missing value in our data

As you can see this is a dataset with no missing values so you’ve been spared for today but 99% of the time our data will not be complete and we’ll need to do some preprocessing to fill in those values.

Gaining more insights

df.describe()

As you can see in the output we can see a lot of statistical descriptions of our numeric data. Including different quartiles found in the data.

Visualizing data

We’ll dive deep into Data visualizations and finish up on getting our data ready for our Machine Learning model on the next series. The next part of the series will be released Friday, April 09, stay tuned!

Microsoft Learn Student Ambassador | Azure Associate Data Scientist | Web Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store