What is Dataset?
Usually dataset refers to the data that you have, it is combined of both dependent as well as independent variables. In ML lingo, dataset is the pair (X, y) where X refers to set of independent variables and y is the target. X is also called the feature set. Moreover, using variables/features from X you can generate other features also.

Question: When you import the dataset for data processing, you need to define two entities. 1) Metrics of Features, 2) Dependent Variable Vector
- What is metrics of features?
- What is dependent variable vector?
Answer: Any dataset that you going to train a machine learning model, you have features and dependent variable vector. Features are the column with which you are going to predict a dependent variable, and dependent variable is the last column.
Each feature, or column, represents a measurable piece of data that can be used for analysis: Name, Age, Sex, Fare, and so on. Features are also sometimes referred to as “variables” or “attributes.”
Here, “Purchased” is dependent variable, whereas Age, Salary, Country are features.

Categorical Data = Nominal = String = Qualitative Data = Ordinal = Booleon
Features = Variables = attributes.
Numeric = Integer = Quantitative = Float
Two entities of dataset
- Metrics of features
- Dependent Variable Vector
Metrics of Features:
Any dataset has entity (means features) and dependent variable vectors.
| Features (Independent Variable) | Dependent Variable (level, class) |
| Features are the columns which you are going to predict the dependent variable.Source IP, Source Load, Destination Load, Time, Frequency are features.X = Independent Variable Iloc = locate indexes | It is the variable which is being tested.It is the last column.Attack_cat (Malicious or Normal) is dependent variable.Dependent variable is also known as Class, or Level.Y = Dependent variable |
Features are also sometimes referred to as “variables” or “attributes.”
Data Variables





