ML Concepts – Understanding the Dataset

What is Dataset? Usually dataset refers to the data that you have, it is combined of both dependent as well as independent variables. In ML..

What is Dataset?

Usually dataset refers to the data that you have, it is combined of both dependent as well as independent variables. In ML lingo, dataset is the pair (X, y) where X refers to set of independent variables and y is the target. X is also called the feature set. Moreover, using variables/features from X you can generate other features also.

Question: When you import the dataset for data processing, you need to define two entities. 1) Metrics of Features, 2) Dependent Variable Vector

  • What is metrics of features?
  • What is dependent variable vector?

Answer: Any dataset that you going to train a machine learning model, you have features and dependent variable vector. Features are the column with which you are going to predict a dependent variable, and dependent variable is the last column.

Each feature, or column, represents a measurable piece of data that can be used for analysis: Name, Age, Sex, Fare, and so on. Features are also sometimes referred to as “variables” or “attributes.”

Here, “Purchased” is dependent variable, whereas Age, Salary, Country are features.

Categorical Data = Nominal = String = Qualitative Data = Ordinal = Booleon

Features = Variables = attributes.

Numeric = Integer = Quantitative = Float

Two entities of dataset

  1. Metrics of features
  2. Dependent Variable Vector

Metrics of Features:

Any dataset has entity (means features) and dependent variable vectors.

Features (Independent Variable)Dependent Variable (level, class)
Features are the columns which you are going to predict the dependent variable.Source IP, Source Load, Destination Load, Time, Frequency are features.X = Independent Variable   Iloc = locate indexesIt is the variable which is being tested.It is the last column.Attack_cat (Malicious or Normal) is dependent variable.Dependent variable is also known as Class, or Level.Y = Dependent variable

Features are also sometimes referred to as “variables” or “attributes.”

Data Variables

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *

About the Author

Dr Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

BlockSpare — News, Magazine and Blog Addons for (Gutenberg) Block Editor