Understanding Structured Data Classification: A Comprehensive Guide

Rumman Ansari   Software Engineer   2024-08-03 02:49:14   523  Share
Subject Syllabus DetailsSubject Details 1 Questions
☰ TContent
☰Fullscreen

Table of Content:

Structured Data Classification

Introduction

Classification can be performed on structured or unstructured data.

To start with, let's learn classification of structured data.

Before we get into what is classification of structured data, let's see what is classification?

Classification is a technique where we categorize data into a given number of classes. The main goal of a classification problem is to identify the category/class to which a new data will fall under.

Now, what is structured data?

Any data which has a high level of organization can be considered as structured data. This includes data in an excel sheet, relational database etc.

Vocabulary: Classification
  • ClassifierAn algorithm that maps the input data to a specific category.

  • Featurefeature is an individual measurable property of a phenomenon being observed.

  • Feature selectionIt is the process of identifying/deriving the most meaningful data(features) from the given input.

  • Classification model-A classification model tries to draw some conclusion from the input values given for training. It will predict the class labels/categories for the new data.

Vocabulary: Classification Types
  • Binary ClassificationClassification task with two possible outcomes. Eg: Gender classification(Male/Female)

  • Multi class classification : Classification with more than two classes. In multi class classification each sample is assigned to one and only one target label. Eg: An animal can be cat or dog but not both at the same time

  • Multi label classificationClassification task where each sample is mapped to a set of target labels (more than one class). Eg: A news article can be about sports, a person, location at the same time.

  • Supervised classificationIt is a technique where the learning is based on a training set of correctly labeled observations. Eg: Email classification where input data is a set of emails labeled as spam/not spam.

  • Unsupervised classificationGrouping the observations into various categories based on some similarity measures. Eg: Grouping of news articles based on the content.

Vocabulary : Statistical Data

Quantitative Variables:

  • Discrete: Numeric variables with countable number of values between any two values.

  • Continuous: Numeric variables with infinite number of values between any two values.

Qualitative Variables:

  • Categorical: Variables which has finite number of categories/groups with no logical order.

  • Ordinal: Variables similar to categorical variable with clear ordering of variables.

Note: For further reference, read this link.

Classification Pipeline
Classification Pipeline
Figure: Classification Pipeline



Stay Ahead of the Curve! Check out these trending topics and sharpen your skills.