What Is a Python Abstract Class? When and How to Use It

Published: June 5, 2025

Categories:

Written by:
Nathan Rosidi

Learn how abstract classes help structure reusable Python code through a real student data analysis project

You know Object Oriented Programming, which allows you to develop reusable code. Abstract classes are one of these object-oriented programming concepts, and we will get deep into them. They will help you avoid duplication in your code, as we will discover.

These days, AI is so popular, but at the end of the article, you will see how Python Abstract Class automates a Data Project, as we will use a data project to analyze student achievement in Portagues, by using statistical methods.

But before that, what is an abstract class, and why and when must you use it? Let’s answer these questions, grab the logic, and start the project. Let’s start!

What Is an Abstract Class in Python?

An abstract class fundamentally serves as a special foundational plan for other classes you might design later.

You never create a direct object or an “instance” from this abstract class.

The primary purpose involves setting out a common structure. This defines some rules plus methods that other, more concrete classes will eventually follow.

For instance, check the image below. Here, the abstract class Vehicle defines the blueprint, while the Car class builds on it to create real objects.

But how?

An abstract class sets up a list of functions that any class using it has to include. It doesn’t say how those functions work. It makes sure they exist.

And this keeps your code clean and consistent. Everything follows the same pattern, but you still have the freedom to decide the details.

Before diving into abstract classes, it's helpful to understand how a `class` in Python works. If you're unsure about how class-level methods function, check out this practical guide on the python class method.

Why and When to Use Abstract Classes in Python

Abstract classes will give us an option to define a shared structure for all analyzers to avoid duplication.

They are useful when you want multiple classes to follow the same structure. They make sure each class includes specific functions, keeping your code more organized and easier to manage.

If your project follows a clear step-by-step process, abstract classes help you keep things organized. They let you define shared steps once and reuse them across similar tasks. You'll see how helpful this is when we start applying it in the upcoming data project.

Let’s discover it by doing a data project.

Applying Abstract Classes to a Real Data Project: Analysing Student Performance

Applying Python Abstract Classes to a Real Data Project

Link to this data project: https://platform.stratascratch.com/data-projects/student-performance-analysis

Project Description

This project aims to analyze student achievement in Mathematics and the Portuguese language based on data from two Portuguese schools. To start the project, let’s first load and clean the data.

Class Structure Overview

To truly understand the power of abstraction in this project, let’s create our two core classes that we’ll use for the project:

StudentDataAnalyzer
MathDataAnalyzer

StudentDataAnalyzer is an abstract base class. It defines the structure and expectations: methods like load_data(), clean_data(), explore_data(), and others are listed but left empty using the pass keyword. This is by design. The class doesn't do the analysis, it just declares what every subclass must do.

This brings us to MathDataAnalyzer, a concrete subclass. It inherits from StudentDataAnalyzer and implements all those methods with real logic tailored to our student dataset. This pattern is grounded in the object-oriented programming principles we discussed earlier: the abstract class serves as the blueprint, and the concrete class creates the actual object.

This separation ensures code reusability, enforces structure, and keeps logic modular. In practice, if you wanted to analyze a different dataset (say Portuguese scores or science scores), you'd just create a new subclass; no need to rewrite shared logic.

Load and Clean Data Using a Python Abstract Class

These functions are defined inside the MathDataAnalyzer class. It is responsible for the implementation of the analysis steps. We define them here rather than in the StudentDataAnalyzer base class because the base class is abstract; it only outlines which functions every subclass must have, without explaining how they work.

Each method in the base class uses pass, acting as a placeholder and enforcing structure.

The MathDataAnalyzer, as a concrete subclass, is where the logic comes to life. For example, the load_data() method reads a CSV file with student performance data, and clean_data() fills missing values with zero.

import pandas as pd
from abc import ABC, abstractmethod

class StudentDataAnalyzer(ABC):
    def __init__(self, filepath):
        self.filepath = filepath
        self.data = None

    @abstractmethod
    def load_data(self):
        pass

    @abstractmethod
    def clean_data(self):
        pass

    @abstractmethod
    def explore_data(self):
        pass

    @abstractmethod
    def visualize_data(self):
        pass

    @abstractmethod
    def perform_statistical_analysis(self):
        pass

    @abstractmethod
    def run(self):
        pass

class MathDataAnalyzer(StudentDataAnalyzer):
    def load_data(self):
        self.data = pd.read_csv(self.filepath, sep=';')
        print("✅ Math data loaded.")

    def clean_data(self):
        self.data.fillna(0, inplace=True)
        print("🧹 Math data cleaned.")

    def explore_data(self):
        print("🔎 Data exploration not yet implemented.")

    def visualize_data(self):
        print("📊 Visualization not yet implemented.")

    def perform_statistical_analysis(self):
        print("📈 Statistical analysis not yet implemented.")

    def run(self):
        self.load_data()
        self.clean_data()

    def run(self):
        self.load_data()
        self.clean_data()
        self.explore_data()
        self.visualize_data()
        self.perform_statistical_analysis()

Now we mentioned this structure. The abstract class ensures that every analyzer follows the same flow. But how would you run it?

How to Run It?

Here is the code:

math_analyzer = MathDataAnalyzer('student-mat.csv')
math_analyzer.run()

Now let’s see the output.

And just in two seconds, you will load and clean the data, and see the progress and steps that will come ahead.

While the dataset used here is already clean and contains no missing values, the cleaning step is still important for maintaining a consistent workflow, especially when adapting the script for different datasets that might contain nulls or irregularities.

Exploring Data Through a Python Abstract Class Implementation

Now that we’ve cleaned the data, we will explore it and understand what we’re working with.

The explore_data() method in the MathDataAnalyzer class does this by printing the first few rows, showing data types and column info, and summarizing statistics.

Remember, the exploration logic is written inside the concrete class, not the abstract base class. Each subject or dataset might need different ways to explore the data. So we left the structure in StudentDataAnalyzer and filled in the details in MathDataAnalyzer.

Here is the code to explore the data.

import pandas as pd
from abc import ABC, abstractmethod

class StudentDataAnalyzer(ABC):
    def __init__(self, filepath):
        self.filepath = filepath
        self.data = None

    @abstractmethod
    def load_data(self):
        pass

    @abstractmethod
    def clean_data(self):
        pass

    @abstractmethod
    def explore_data(self):
        pass

    @abstractmethod
    def visualize_data(self):
        pass

    @abstractmethod
    def perform_statistical_analysis(self):
        pass

    @abstractmethod
    def run(self):
        pass

class MathDataAnalyzer(StudentDataAnalyzer):
    def load_data(self):
        self.data = pd.read_csv(self.filepath, sep=';')
        print("✅ Math data loaded.")

    def clean_data(self):
        self.data.fillna(0, inplace=True)
        print("🧹 Math data cleaned.")

    def explore_data(self):
        print("\n🔍 First 5 Rows:")
        print(self.data.head())

        print("\n📋 Data Info:")
        print(self.data.info())

        print("\n📈 Descriptive Statistics:")
        print(self.data.describe())

    def visualize_data(self):
        print("📊 Visualization not yet implemented.")

    def perform_statistical_analysis(self):
        print("📈 Statistical analysis not yet implemented.")
    def run(self):
        self.load_data()
        self.clean_data()
        self.explore_data()
        self.visualize_data()
        self.perform_statistical_analysis()

Before proceeding with the data analysis, we must first understand the data, which we will do. Let’s see how to run it once again.

How to Run It?

Here is the code:

math_analyzer = MathDataAnalyzer('student-mat.csv')
math_analyzer.run()

Now let’s see the output.

Good, but it did not end.

Visualizing Data with a Python Abstract Class Framework

We’ve explored the data, so it’s time to visualize it. We’ll keep this short, since we’ve already explained how each function fits into the project structure.

The visualize_data() method is part of the MathDataAnalyzer class, like others. We put it here because the visuals and what we want to highlight can change based on the subject.

Good, now we know what the data is all about. Now let’s create some visualizations. Here is the code.

from abc import ABC, abstractmethod

class StudentDataAnalyzer(ABC):
    def __init__(self, filepath):
        self.filepath = filepath
        self.data = None

    @abstractmethod
    def load_data(self):
        pass

    @abstractmethod
    def clean_data(self):
        pass

    @abstractmethod
    def explore_data(self):
        pass

    @abstractmethod
    def visualize_data(self):
        pass

    @abstractmethod
    def perform_statistical_analysis(self):
        pass

    @abstractmethod
    def run(self):
        pass

class MathDataAnalyzer(StudentDataAnalyzer):
    def load_data(self):
        self.data = pd.read_csv(self.filepath, sep=';')
        print("✅ Math data loaded.")

    def clean_data(self):
        self.data.fillna(0, inplace=True)
        print("🧹 Math data cleaned.")

    def explore_data(self):
        print("\n🔍 First 5 Rows:")
        print(self.data.head())

        print("\n📋 Data Info:")
        print(self.data.info())

        print("\n📈 Descriptive Statistics:")
        print(self.data.describe())

    def visualize_data(self):
        print("\n📌 Dataset Columns:")
        print(self.data.columns.tolist())

        print("\n📊 Histograms for Grades and Study Time:")
        features = ['G1', 'G2', 'G3', 'age', 'studytime']
        self.data[features].hist(figsize=(10, 6), bins=15)
        plt.tight_layout()
        plt.show()

        print("\n📦 Box Plot: G3 by Gender")
        plt.figure(figsize=(6, 4))
        sns.boxplot(data=self.data, x='sex', y='G3')
        plt.title('Final Grade (G3) by Gender')
        plt.show()

        print("\n📉 Bar Plot: Avg G3 by Internet Access")
        plt.figure(figsize=(6, 4))
        sns.barplot(data=self.data, x='internet', y='G3', estimator='mean', errorbar=None)
        plt.title('Average G3 by Internet Access')
        plt.show()

        print("\n📊 Bar Plot: Avg G3 by Mother\'s Education (Medu)")
        plt.figure(figsize=(6, 4))
        sns.barplot(data=self.data, x='Medu', y='G3', estimator='mean', errorbar=None)
        plt.title("Average G3 by Mother's Education")
        plt.show()

    def perform_statistical_analysis(self):
        print("📈 Statistical analysis not yet implemented.")
    def run(self):
        self.load_data()
        self.clean_data()
        self.explore_data()
        self.visualize_data()
        self.perform_statistical_analysis()

You know what comes next? Let’s see how to run it.

How to Run It?

math_analyzer = MathDataAnalyzer('student-mat.csv')
math_analyzer.run()

Let’s see the output.

Statistical Modeling Within a Python Abstract Class Setup

In this step, we apply two basic statistical tests. The first one is a T-Test, which checks whether students with internet access have significantly different final grades than those without. The second is a chi-squared test, which examines the relationship between parental cohabitation status and whether a student passes or fails.

Good, now we are at the final step: statistical modeling. Let’s see the code:

from abc import ABC, abstractmethod
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind, chi2_contingency

class StudentDataAnalyzer(ABC):
    def __init__(self, filepath):
        self.filepath = filepath
        self.data = None

    @abstractmethod
    def load_data(self):
        pass

    @abstractmethod
    def clean_data(self):
        pass

    @abstractmethod
    def explore_data(self):
        pass

    @abstractmethod
    def visualize_data(self):
        pass

    @abstractmethod
    def perform_statistical_analysis(self):
        pass

    @abstractmethod
    def run(self):
        pass

class MathDataAnalyzer(StudentDataAnalyzer):
    def load_data(self):
        self.data = pd.read_csv(self.filepath, sep=';')
        print("✅ Math data loaded.")

    def clean_data(self):
        self.data.fillna(0, inplace=True)
        print("🧹 Math data cleaned.")

    def explore_data(self):
        print("\n🔍 First 5 Rows:")
        print(self.data.head())

        print("\n📋 Data Info:")
        print(self.data.info())

        print("\n📈 Descriptive Statistics:")
        print(self.data.describe())

    def visualize_data(self):
        print("\n📌 Dataset Columns:")
        print(self.data.columns.tolist())

        print("\n📊 Histograms for Grades and Study Time:")
        features = ['G1', 'G2', 'G3', 'age', 'studytime']
        self.data[features].hist(figsize=(10, 6), bins=15)
        plt.tight_layout()
        plt.show()

        print("\n📦 Box Plot: G3 by Gender")
        plt.figure(figsize=(6, 4))
        sns.boxplot(data=self.data, x='sex', y='G3')
        plt.title('Final Grade (G3) by Gender')
        plt.show()

        print("\n📉 Bar Plot: Avg G3 by Internet Access")
        plt.figure(figsize=(6, 4))
        sns.barplot(data=self.data, x='internet', y='G3', estimator='mean', errorbar=None)
        plt.title('Average G3 by Internet Access')
        plt.show()

        print("\n📊 Bar Plot: Avg G3 by Mother's Education (Medu)")
        plt.figure(figsize=(6, 4))
        sns.barplot(data=self.data, x='Medu', y='G3', estimator='mean', errorbar=None)
        plt.title("Average G3 by Mother's Education")
        plt.show()

    def perform_statistical_analysis(self):
        print("\n🔬 T-Test: Internet Access vs Final Grades")
        internet_yes = self.data[self.data['internet'] == 'yes']['G3']
        internet_no = self.data[self.data['internet'] == 'no']['G3']

        t_stat, p_val = ttest_ind(internet_yes, internet_no, equal_var=False)
        print(f"T-statistic: {t_stat:.2f}, P-value: {p_val:.4f}")
        if p_val < 0.05:
            print("✅ Statistically significant difference in grades based on internet access.")
        else:
            print("❌ No statistically significant difference found.")

        print("\n🧮 Chi-Square Test: Parental Cohabitation vs Grade Performance")
        self.data['performance'] = self.data['G3'].apply(lambda x: 'pass' if x >= 10 else 'fail')
        contingency = pd.crosstab(self.data['Pstatus'], self.data['performance'])

        chi2, p, dof, expected = chi2_contingency(contingency)
        print(f"Chi2: {chi2:.2f}, P-value: {p:.4f}")
        if p < 0.05:
            print("✅ Significant association between parental status and student performance.")
        else:
            print("❌ No significant association detected.")

    def run(self):
        self.load_data()
        self.clean_data()
        self.explore_data()
        self.visualize_data()
        self.perform_statistical_analysis()

How to Run It?

math_analyzer = MathDataAnalyzer('student-mat.csv')
math_analyzer.run()

Here is the output.

I left the statistical analysis to you, but if you want to see the full report, visit here.

Final Thoughts

In this article, we saw how a Python abstract class brings structure and clarity to a data project. It helped us lay out the key steps like loading, cleaning, and analyzing without repeating code.

We kept things clean and flexible by separating the structure into an abstract base class and the logic into a concrete class. This makes it easy to build new analyzers later while sticking to a clear and reusable format.

If you're preparing for interviews and want to reinforce your understanding of Python concepts like abstract classes, be sure to review these python interview questions often asked by hiring managers.

What Is a Python Abstract Class? When and How to Use It

What Is an Abstract Class in Python?

But how?

Why and When to Use Abstract Classes in Python

Applying Abstract Classes to a Real Data Project: Analysing Student Performance

Project Description

Class Structure Overview

Load and Clean Data Using a Python Abstract Class

How to Run It?

Exploring Data Through a Python Abstract Class Implementation

How to Run It?

Visualizing Data with a Python Abstract Class Framework

How to Run It?

Statistical Modeling Within a Python Abstract Class Setup

How to Run It?

Final Thoughts

Latest Posts:

What Is a Python Abstract Class? When and How to Use It

How We Oversaturated the Data Science Job Market

A No-Fluff Guide to Polars vs Pandas