Understanding Structured Data Classification: A Comprehensive Guide
Table of Content:
Structured Data Classification
Introduction
Classification can be performed on structured or unstructured data.
To start with, let's learn classification of structured data.
Before we get into what is classification of structured data, let's see what is classification?
Classification is a technique where we categorize data into a given number of classes. The main goal of a classification problem is to identify the category/class to which a new data will fall under.
Now, what is structured data?
Any data which has a high level of organization can be considered as structured data. This includes data in an excel sheet, relational database etc.
Vocabulary: Classification
-
Classifier- An algorithm that maps the input data to a specific category.
-
Feature: A feature is an individual measurable property of a phenomenon being observed.
-
Feature selection: It is the process of identifying/deriving the most meaningful data(features) from the given input.
-
Classification model-A classification model tries to draw some conclusion from the input values given for training. It will predict the class labels/categories for the new data.
Vocabulary: Classification Types
-
Binary Classification: Classification task with two possible outcomes. Eg: Gender classification(Male/Female)
-
Multi class classification : Classification with more than two classes. In multi class classification each sample is assigned to one and only one target label. Eg: An animal can be cat or dog but not both at the same time
-
Multi label classification: Classification task where each sample is mapped to a set of target labels (more than one class). Eg: A news article can be about sports, a person, location at the same time.
-
Supervised classification: It is a technique where the learning is based on a training set of correctly labeled observations. Eg: Email classification where input data is a set of emails labeled as spam/not spam.
-
Unsupervised classification: Grouping the observations into various categories based on some similarity measures. Eg: Grouping of news articles based on the content.
Vocabulary : Statistical Data
Quantitative Variables:
-
Discrete: Numeric variables with countable number of values between any two values.
-
Continuous: Numeric variables with infinite number of values between any two values.
Qualitative Variables:
-
Categorical: Variables which has finite number of categories/groups with no logical order.
-
Ordinal: Variables similar to categorical variable with clear ordering of variables.
Note: For further reference, read this link.
Classification Pipeline
- Question 1: Structured Data Classification - Hands On Solution