AI-Powered Educational Assessment Platform

This AI-powered educational assessment platform streamlines the grading process for educators, providing automated question generation, customizable scoring, and real-time insights across multiple subjects. By leveraging NLP and machine learning, it reduces manual effort, enhances accuracy in grading diverse question types, and delivers personalized feedback to students. The 24/7 AI assistant improves student learning outcomes by identifying weak areas and offering continuous support. This platform not only saves time for educators but also adds value for institutions aiming to modernize assessments and foster deeper student engagement, ultimately leading to improved academic performance and operational efficiency.

Year
2021-2022

Introduction

In today’s digital education landscape, the need for personalized, scalable, and efficient assessment tools is paramount. This AI-powered application bridges the gap between traditional grading methods and modern educational demands, offering a robust platform where teachers can seamlessly generate, administer, and analyze question papers while students receive instant, insightful feedback.

Designed to support diverse subjects, from philosophy to science, this application processes data through sophisticated NLP models, providing an adaptive grading experience that handles various question types—MCQs, Point-Based, and Subjective—with tailored scoring methods. Teachers and students each have their own portal, ensuring streamlined interactions and data management.

Key to the application’s appeal is its 24/7 AI assistant, which guides students in understanding mistakes, enhances learning through personalized feedback, and provides virtual support whenever needed. By harnessing advanced data preprocessing, custom scoring mechanisms, and an AI-driven feedback loop, this platform creates a holistic and adaptable educational experience, empowering both students and educators alike.

Architecture Overview

This architecture is structured to efficiently process, store, and retrieve question-answer pairs while maintaining accuracy and scalability. It consists of a Data Layer, Embedding Layer, Prompt Handling Layer, and Inference & Scoring Layer, with optional Agent Integration for enhanced reasoning and dynamic query handling.

1. Data Layer

Purpose: Store and preprocess question-answer pairs for later retrieval.

  • Technology Stack:

    • Data Storage: qa_database.csv (local CSV storage) or scalable cloud-based options (e.g., Databricks for larger datasets).

    • Data Preprocessing: Script store_data.py to load and preprocess data, preparing it for embedding and storage.

    • Data Transformation: Converts questions and answers into a format suitable for embedding and stores them in the database.

Workflow:

  • The store_data.py script loads questions and answers into qa_database.csv, where it undergoes preprocessing (text normalization, basic text cleaning).

  • Scaling Option: For larger datasets, the database can be moved to a cloud-based system like Databricks for batch processing and transformation.

2. Embedding Layer

Purpose: Convert questions and answers into vector embeddings for semantic similarity.

  • Technology Stack:

    • Embedding Generation: Sentence Transformers library (BERT or similar transformer model).

    • Storage Options:

      • CSV (small-scale storage): embed_answers_csv.py to store embeddings locally.

      • Vector Database (large-scale storage): Chromadb or PineCone for vector-based storage of embeddings.

Workflow:

  • The embed_answers_csv.py and embed_answers_chromadb.py scripts create embeddings for each question-answer pair and store them either in CSV or in a vector database.

  • Dimensionality Reduction: Sentence Transformers generate high-dimensional vectors (embeddings) that retain semantic information for fast, meaningful comparisons during query retrieval.

3. Prompt Handling Layer

Purpose: Construct and manage prompts for LLMs, integrating contextual data and few-shot examples to improve query handling.

  • Technology Stack:

    • Prompt Engineering and Orchestration: Langchain for constructing prompts, with few-shot examples and advanced prompt structures.

    • Real-Time Data Integration: SERPAPI for dynamic data querying, adding real-time information to prompts to prevent hallucinations.

Workflow:

  • The langchain_tool.py script uses Langchain to create prompts that integrate contextual data (few-shot examples).

  • For real-time queries, SERPAPI fetches the latest data (e.g., current events), which is then incorporated into the prompt before sending it to the LLM.

4. Inference & Scoring Layer

Purpose: Generate answers based on student queries and calculate scores using semantic similarity and other metrics.

  • Technology Stack:

    • Open-Source LLM: FLAN-T5 or Llama-based models for local deployment.

    • Prompt Execution: flan_llm.py handles prompt-based inference for questions.

Scoring System:

  • Semantic Similarity: Calculate scores based on similarity between student answers and embedded answer vectors in the database.

  • Weighted Scoring (Optional): Add scoring weights based on subject complexity, question type, or other metadata.

  • Scripts:

    • Comparison Script: Retrieves the embedded answer, calculates similarity with student response, and returns a score.

Example:

  • A question like "Can humans achieve world peace and why?" will trigger a prompt with a few-shot example of similar questions, enhancing the model’s response quality.

5. Agent Integration (Optional)

Purpose: Enhance adaptability and reasoning in handling dynamic, complex queries, and to tackle hallucination issues.

  • Technology Stack:

    • Langchain Agents: Integrates with Langchain to allow models to interact with external tools or apply self-reflection.

    • Tooling: SERPAPI or other APIs to handle up-to-date information needs.

Workflow:

  • Agents provide additional logic and self-improvement capabilities, helping the application adapt to new types of questions and providing real-time information updates.

  • Langchain’s agent framework can facilitate actions like tool use or accessing memory, allowing the model to solve complex problems that require reasoning.

6. Application Front-End

Purpose: Provide a simple, user-friendly interface for question-answer interactions, score visualization, and feedback.

  • Technology Stack:

    • Frontend Framework: React or Vue.js for a responsive, interactive UI.

    • Backend API: FastAPI or Flask to manage the question-answering flow, from question submission to scoring and feedback.

Features:

  • Question Submission: Allows teachers to submit question papers and view scoring results.

  • Real-Time Scoring: Students receive scores and feedback immediately after submission.

  • Feedback and Analysis: Displays personalized feedback and insights on performance.

  • AI assistant: Provides virtual support to students and teachers alike

Data Flow and Integration

  1. Data Ingestion (Data Layer): Teacher-submitted questions are stored in qa_database.csv and preprocessed.

  2. Embedding Creation (Embedding Layer): Answers are embedded and stored as vectors.

  3. Prompt Generation and Execution (Prompt Handling & Inference Layers): The system constructs prompts and sends them to an LLM (e.g., FLAN-T5) for response generation.

  4. Answer Comparison and Scoring (Inference Layer): The student’s answer is compared with the stored vector embedding, and a score is generated.

  5. Agent Interaction (Optional): Agents dynamically enhance response quality, pulling in real-time data or reasoning to refine answers.

  6. UI Interaction (Front-End): Students and teachers interact with the platform to view scores, feedback, and analyses.