Lumi Chatbot - Comprehensive Codebase Analysis Report

Generated: October 21, 2025
Analyst: AI Code Reviewer


Executive Summary

This is a multi-technology chatbot platform combining Laravel (PHP backend), Python (Streamlit UI), LangChain (AI framework), Pinecone (vector database), and OpenAI (LLM). The system allows creation of multiple chatbots with custom training data, deployable via an embed widget.


1. Technologies & Frameworks Used

Backend (Laravel - PHP)

Frontend/Chatbot Interface (Python)

Supporting Technologies

Development Tools


2. How It Works (High-Level Architecture)

System Flow

┌─────────────────────────────────────────────────────────────┐
│                     ADMIN INTERFACE                          │
│          (Filament @ lumi-public/index.php)                  │
│                                                               │
│  • Create/Edit Bots (slug, prompts, namespace)               │
│  • Upload Training Documents (files/URLs)                    │
│  • Manage bot configurations                                │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                  LARAVEL BACKEND                             │
│              (lumi-backend/)                                 │
│                                                               │
│  Models:                                                      │
│    • Bot (name, slug, role_prompt, system_prompt_template,   │
│            pinecone_namespace)                               │
│    • TrainingDocument (bot_id, type, source, status)         │
│                                                               │
│  API Routes (/api):                                          │
│    • GET /bots/{slug} - Fetch bot configuration             │
│    • GET /bots - List all bots (route exists but no impl)   │
│                                                               │
│  Storage:                                                     │
│    • SQLite/MySQL database                                   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│              PYTHON CHATBOT SERVICE                          │
│           (lumi-backend/chatbot/app.py)                      │
│            Run via: run_chatbot_fixed.sh                     │
│                                                               │
│  1. Fetch bot config from Laravel API                        │
│  2. Initialize Pinecone with bot's namespace                 │
│  3. Accept user messages via Streamlit UI                    │
│  4. Retrieve relevant context from Pinecone                  │
│  5. Build prompt with system template + context              │
│  6. Call OpenAI GPT-4 for response                          │
│  7. Display response in Streamlit chat                       │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                 VECTOR DATABASE                              │
│                   (Pinecone)                                 │
│                                                               │
│  • Stores embeddings per bot (via namespace)                │
│  • Each namespace = one bot's knowledge base                 │
│  • Embeddings created via text-embedding-3-small            │
│  • Similarity search for context retrieval                   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                  EMBED WIDGET                                │
│         (chatbot/public/chatbot-embed.js)                    │
│                                                               │
│  • Lightweight JavaScript widget                             │
│  • Creates floating chat button                              │
│  • Opens iframe pointing to Streamlit app with bot slug      │
│  • Customizable position, color, size                        │
└─────────────────────────────────────────────────────────────┘

Data Flow for a Chat Interaction

  1. User opens website with embed code
  2. JavaScript widget loads and creates chat button
  3. User clicks chat button → iframe opens with ?bot=<slug>
  4. Streamlit app receives bot slug from URL parameter
  5. Streamlit calls Laravel API /api/bots/{slug} to get bot config
  6. Bot config returned (role_prompt, system_prompt_template, pinecone_namespace)
  7. User types message in Streamlit chat
  8. LangChain queries Pinecone for relevant context (top 3 docs, >0.5 similarity)
  9. System prompt built from template + context
  10. OpenAI GPT-4 receives: role_prompt + chat history + dynamic system prompt + user message
  11. Response displayed in Streamlit chat
  12. Conversation continues with full history in session state

Training Data Management

Manual Process (Current): - Admin uploads training documents via Filament - Documents stored in Laravel database with status: pending - Manual step: Admin must run manage_documents.py CLI script to upload to Pinecone - Script chunks text (1000 chars, 200 overlap) and creates embeddings - Embeddings uploaded to Pinecone namespace

Intended Automated Process (Incomplete): - TrainDocumentJob.php_ (note: disabled with underscore suffix) was meant to: - Automatically process uploaded documents - Call Python script from Laravel queue - Update document status to trained or error - Status: Not currently active/working


3. Unused Code That Can Be Safely Removed

A. Completely Unused Files

  1. lumi-backend/app/Jobs/TrainDocumentJob.php_
  2. Status: Disabled (underscore suffix)
  3. Purpose: Automate document training via Laravel queues
  4. Issue: Never called, incomplete implementation, references non-existent scripts/train_document.py
  5. Recommendation: Remove entirely OR implement properly (see Section 5)

  6. lumi-backend/scripts/ directory

  7. Status: Empty except for __pycache__
  8. Issue: Job references base_path('scripts/train_document.py') which doesn't exist
  9. Recommendation: Remove directory or create proper training script

  10. lumi-backend/chatbot/passenger_wsgi.py

  11. Status: Contains placeholder "It works!" code
  12. Purpose: Passenger WSGI entry point for Python app
  13. Issue: Not used for Streamlit (which runs via streamlit run)
  14. Recommendation: Remove if not using Passenger WSGI, OR implement properly for Streamlit

  15. lumi-backend/chatbot/public/serve-embed.py

  16. Status: Simple HTTP server for testing embed files
  17. Purpose: Development only
  18. Recommendation: Safe to remove in production; useful for local testing

  19. lumi-backend/chatbot/public/examples.html (likely exists but not read)

  20. Purpose: Demo page for embed widget
  21. Recommendation: Keep for documentation, remove for production

  22. lumi-backend/chatbot/public/test.html

  23. Purpose: Test page for embed widget
  24. Recommendation: Remove in production deployment

  25. lumi-backend/test (empty file in root)

  26. Recommendation: Remove

B. Unused API Routes

Route: Route::get('/bots', [BotController::class, 'index']); - File: routes/api.php:24 - Issue: BotController has no index() method - Result: Would cause 500 error if called - Recommendation: Remove route OR implement index() method

C. Unused Database Columns

training_documents table: - pinecone_id column (nullable, never populated) - pinecone_metadata column (JSON, nullable, never populated) - Recommendation: Remove in a migration if not planning to use, OR implement usage

D. Unused Laravel Features

  1. Laravel Sanctum
  2. Installed but no authentication in use
  3. API routes are public
  4. Recommendation: Remove dependency if not planning auth, OR implement auth

  5. /api/user route

  6. Protected by auth:sanctum but Sanctum not configured
  7. Never used
  8. Recommendation: Remove

  9. Welcome Blade View

  10. resources/views/welcome.blade.php
  11. Standard Laravel welcome page
  12. Not used (Filament is the UI)
  13. Recommendation: Remove

  14. Frontend Assets

  15. resources/js/app.js, resources/js/bootstrap.js, resources/css/app.css
  16. Vite configuration exists but not used (Filament has its own assets)
  17. Recommendation: Remove if not building custom frontend

E. Unused Python Dependencies

From requirements.txt: - pytest, pytest-asyncio, pytest-socket, syrupy: Testing frameworks (not seeing any tests) - langchain-tests: Testing utilities for LangChain - SQLAlchemy: Database ORM (not used, Laravel handles DB) - FastAPI, uvicorn (if in full requirements): API framework (Streamlit used instead) - Recommendation: Remove testing dependencies in production, keep in development requirements


4. Code That Can Be Simplified

A. app.py - Streamlit Chatbot

Current Issues: 1. Pydantic Model Rebuild Hack (Lines 6-10) python from langchain.schema import BaseCache from langchain.callbacks.manager import Callbacks from langchain_openai import ChatOpenAI ChatOpenAI.model_rebuild() - Issue: Workaround for Pydantic forward reference issues - Simplification: Update to LangChain 0.4+ or use proper imports

  1. Hardcoded Model Settings python llm = ChatOpenAI(model="gpt-4o", temperature=1, cache=None)
  2. Issue: Model, temperature hardcoded; should be configurable per bot
  3. Simplification: Add model_name and temperature to Bot model

  4. Redundant Session Key python session_key = f"messages_{bot_slug}"

  5. Issue: If only one bot per session, simpler to use "messages"
  6. Simplification: Use single key if multi-bot sessions not needed

  7. Mixing SystemMessage Types

  8. Lines 64, 104: Two different SystemMessage usages
  9. Role prompt added at start (hidden)
  10. Dynamic context prompt added per message (also hidden)
  11. Simplification: Clarify architecture - use single system message with template

Simplified app.py Structure:

# Remove Pydantic hack when upgraded
# Make model/temperature configurable
# Use clearer message structure
# Add error handling for API calls
# Cache bot config to reduce API calls

B. manage_documents.py

Simplification Opportunities:

  1. Redundant Environment Variable Check
  2. Lines 15-20: Check could be function
  3. Simplification: Create validate_env() helper

  4. Unused Function

  5. chunk_text() (lines 29-31) defined but never called
  6. Simplification: Remove

  7. ASCII Namespace Sanitization

  8. Lines 34-38: Overly cautious for modern Pinecone
  9. Simplification: Pinecone supports UTF-8, remove or simplify

  10. Argument Handling

  11. --filter-value or --source for same purpose (lines 111, 122)
  12. Simplification: Use single parameter

Simplified Structure:

# Remove unused chunk_text
# Simplify CLI args
# Add batch upload support
# Add progress indicators
# Better error messages

C. BotController.php

Current:

public function show($slug)
{
    $bot = \App\Models\Bot::where('slug', $slug)->firstOrFail();
    return response()->json([
        'name' => $bot->name,
        'role_prompt' => $bot->role_prompt,
        'pinecone_namespace' => $bot->pinecone_namespace,
    ]);
}

Issues: 1. Hardcoded response structure (doesn't include system_prompt_template) 2. Missing from JSON response but used in app.py (line 45) 3. Full namespace path not using model import

Simplified:

use App\Models\Bot;

public function show(string $slug): JsonResponse
{
    $bot = Bot::where('slug', $slug)->firstOrFail();
    return response()->json([
        'name' => $bot->name,
        'role_prompt' => $bot->role_prompt,
        'system_prompt_template' => $bot->system_prompt_template,
        'pinecone_namespace' => $bot->pinecone_namespace,
    ]);
}

D. Filament Resources

BotResource.php: - Line 44: default(Bot::DEFAULT_SYSTEM_PROMPT) - This default is already in the model accessor - Simplification: Remove redundant default

TrainingDocumentResource.php: - Lines 37-47: Two separate fields for same column based on type - Complex reactive logic - Simplification: Use single polymorphic field or custom field type

E. Shell Scripts

run_chatbot-fixed.sh vs run_chatbot.sh: - Only difference is parameters to streamlit run - Simplification: Keep one script, use environment variables for parameters


5. Deprecated/Suboptimal Code Requiring Replacement

A. ⚠️ CRITICAL: app.py Missing system_prompt_template in Response

File: lumi-backend/chatbot/app.py:45

Current Code:

system_prompt_template = bot.get("system_prompt_template")

Issue: - API endpoint doesn't return this field! - Will always be None - Falls back to hardcoded prompt (lines 98-103)

Impact: Bot's custom system prompt template is ignored!

Fix Required in BotController.php:

return response()->json([
    'name' => $bot->name,
    'role_prompt' => $bot->role_prompt,
    'system_prompt_template' => $bot->system_prompt_template, // ADD THIS
    'pinecone_namespace' => $bot->pinecone_namespace,
]);

B. ⚠️ Security Issues

1. Hardcoded URLs and IPs

File: chatbot-embed.js

chatUrl: 'http://151.106.62.241:8501',  // Line 23

File: QUICK-START.md, test.html

http://151.106.62.241:8501

Issues: - Hardcoded IP address - HTTP (not HTTPS) - Won't work if IP changes

Fix: - Use environment variable or relative URL - Configure via embed init options - Use HTTPS in production

2. No CORS Protection

File: app.py - No CORS headers - Accepts requests from any origin - Fix: Add Streamlit CORS config or use reverse proxy

3. No API Authentication

File: routes/api.php - All endpoints public - Anyone can query bot configs - Fix: Implement API key or token authentication

4. No Rate Limiting

C. Database Schema Issues

1. No Soft Deletes

Impact: Deleting bots permanently removes training data Fix: Add soft deletes to bots and training_documents

2. Missing Indexes

Tables: bots, training_documents Missing indexes on: - bots.slug (frequently queried, should be indexed) - training_documents.bot_id (foreign key, auto-indexed) - training_documents.status (for filtering)

Fix:

$table->string('slug')->unique()->index();
$table->string('status')->index();

D. Missing Error Handling

1. app.py - No API Error Handling

Lines 33-36:

resp = requests.get(f"{API_BASE}/bots/{bot_slug}")
if resp.status_code != 200:
    st.error(f"Bot "{bot_slug}" not found (HTTP {resp.status_code}).")
    st.stop()

Missing: - Network error handling (timeout, connection error) - JSON decode errors - Pinecone connection errors - OpenAI API errors

Fix:

try:
    resp = requests.get(f"{API_BASE}/bots/{bot_slug}", timeout=5)
    resp.raise_for_status()
    bot = resp.json()
except requests.RequestException as e:
    st.error(f"Failed to connect to bot service: {e}")
    st.stop()
except ValueError:
    st.error("Invalid response from bot service")
    st.stop()

2. manage_documents.py - Generic Exception Handling

Lines 135-142: - Catches all exceptions - Prints traceback - Not actionable for users

Fix: - Specific exception types - User-friendly messages - Proper exit codes

E. Deprecated Patterns

1. Direct Model Access in Controller

File: BotController.php:12

$bot = \App\Models\Bot::where('slug', $slug)->firstOrFail();

Issue: Not following repository pattern per Laravel rules

Fix:

// Create app/Repositories/BotRepository.php
class BotRepository {
    public function findBySlug(string $slug): Bot {
        return Bot::where('slug', $slug)->firstOrFail();
    }
}

// Inject in controller
public function __construct(private BotRepository $botRepository) {}

public function show(string $slug) {
    $bot = $this->botRepository->findBySlug($slug);
    // ...
}

2. Missing Type Declarations

File: BotController.php

Current:

public function show($slug)

Should be (PHP 8.2+):

public function show(string $slug): JsonResponse

3. Missing declare(strict_types=1);

All PHP files should start with:

<?php

declare(strict_types=1);

namespace App\...

Per PSR-12 and Laravel best practices.

F. Suboptimal LangChain Usage

1. Not Using LangChain Chains

Current: Manual message construction and LLM calls

Better: Use LangChain's ConversationalRetrievalChain

from langchain.chains import ConversationalRetrievalChain

chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=auto_retriever,
    return_source_documents=True,
)

result = chain({"question": user_input, "chat_history": chat_history})

2. Not Using Memory Properly

Current: Storing full message list in Streamlit session

Better: Use LangChain memory with summarization

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)

3. No Caching of Embeddings

Issue: Every retrieval creates new embedding of query

Fix: Cache embeddings or use LangChain's built-in caching


6. Containerization Strategy

Current Deployment (cPanel)

Given your cPanel deployment and the hybrid PHP/Python stack, here are options from simplest to most robust:

Reasoning: - Lando is primarily for local development - Not designed for production deployment - Doesn't work well with cPanel - Overkill for this project

Verdict: Skip Lando.


Why This is the Simplest: - Single docker-compose.yml file - Works with cPanel if Docker is available - Easy to manage both PHP and Python - Can run on any VPS

Structure:

# docker-compose.yml
version: '3.8'

services:
  # Laravel Backend
  laravel:
    build:
      context: ./lumi-backend
      dockerfile: Dockerfile.laravel
    ports:
      - "8000:8000"
    volumes:
      - ./lumi-backend:/var/www/html
    environment:
      - DB_CONNECTION=sqlite
      - DB_DATABASE=/var/www/html/database/database.sqlite
    depends_on:
      - chatbot

  # Python Chatbot
  chatbot:
    build:
      context: ./lumi-backend/chatbot
      dockerfile: Dockerfile.chatbot
    ports:
      - "8501:8501"
    environment:
      - PINECONE_API_KEY=${PINECONE_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - PINECONE_INDEX_NAME=${PINECONE_INDEX_NAME}
      - LARAVEL_API_BASE_URL=http://laravel:8000/api
    volumes:
      - ./lumi-backend/chatbot:/app

  # Optional: Nginx reverse proxy
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - laravel
      - chatbot

Dockerfiles:

lumi-backend/Dockerfile.laravel:

FROM php:8.2-fpm

# Install dependencies
RUN apt-get update && apt-get install -y \
    git \
    curl \
    libpng-dev \
    libonig-dev \
    libxml2-dev \
    zip \
    unzip \
    sqlite3 \
    libsqlite3-dev

# Install PHP extensions
RUN docker-php-ext-install pdo pdo_sqlite mbstring exif pcntl bcmath gd

# Install Composer
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

# Set working directory
WORKDIR /var/www/html

# Copy application
COPY . .

# Install PHP dependencies
RUN composer install --no-dev --optimize-autoloader

# Set permissions
RUN chown -R www-data:www-data /var/www/html

# Expose port
EXPOSE 8000

# Run Laravel
CMD php artisan serve --host=0.0.0.0 --port=8000

lumi-backend/chatbot/Dockerfile.chatbot:

FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Expose Streamlit port
EXPOSE 8501

# Run Streamlit
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0", "--server.headless=true"]

Pros: - ✅ Simple to understand - ✅ Single command: docker-compose up - ✅ Consistent environments - ✅ Easy to scale - ✅ Works on any host with Docker

Cons: - ❌ cPanel may not support Docker (depends on host) - ❌ Need VPS or dedicated server


Option 3: Manual Process Documentation (If No Docker)

If you must stay on cPanel without Docker:

Create deployment script:

deploy.sh:

#!/bin/bash

# Laravel deployment
cd lumi-backend
composer install --no-dev --optimize-autoloader
php artisan migrate --force
php artisan config:cache
php artisan route:cache

# Python deployment
cd chatbot
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Restart services
touch tmp/restart.txt  # For Passenger

Document environment variables in .env.example


Option 4: GitHub Actions + Manual Deploy

Even simpler - automate deployment via GitHub:

.github/workflows/deploy.yml:

name: Deploy to cPanel

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Deploy to cPanel via FTP
        uses: SamKirkland/FTP-Deploy-Action@4.3.0
        with:
          server: ${{ secrets.FTP_SERVER }}
          username: ${{ secrets.FTP_USERNAME }}
          password: ${{ secrets.FTP_PASSWORD }}
          local-dir: ./lumi-backend/
          server-dir: /public_html/

Final Recommendation for Containerization:

If you have Docker access: → Use Docker Compose (Option 2)

If stuck on cPanel without Docker: → Document deployment process and create deployment scripts (Option 3)

For future: → Consider migrating to a VPS (DigitalOcean, Linode, AWS) for better control


7. Code Violating Your Rules & Suggested Changes

Violations Against Your Custom Rules

A. Laravel Rules Violations

1. Missing declare(strict_types=1);

Violates: Laravel rules - "Use strict typing"

Files affected: - app/Models/Bot.php - app/Models/TrainingDocument.php - app/Http/Controllers/Api/BotController.php - All Filament resources

Fix for each file:

<?php

declare(strict_types=1);

namespace App\...
2. Not Following Repository Pattern

Violates: Laravel best practices - "Implement Repository pattern for data access layer"

File: BotController.php

Current:

$bot = \App\Models\Bot::where('slug', $slug)->firstOrFail();

Fix: Create app/Repositories/BotRepository.php:

<?php

declare(strict_types=1);

namespace App\Repositories;

use App\Models\Bot;

class BotRepository
{
    public function findBySlug(string $slug): Bot
    {
        return Bot::where('slug', $slug)->firstOrFail();
    }

    public function all()
    {
        return Bot::all();
    }
}

Update BotController.php:

<?php

declare(strict_types=1);

namespace App\Http\Controllers\Api;

use App\Http\Controllers\Controller;
use App\Repositories\BotRepository;
use Illuminate\Http\JsonResponse;

class BotController extends Controller
{
    public function __construct(
        private BotRepository $botRepository
    ) {}

    public function show(string $slug): JsonResponse
    {
        $bot = $this->botRepository->findBySlug($slug);

        return response()->json([
            'name' => $bot->name,
            'role_prompt' => $bot->role_prompt,
            'system_prompt_template' => $bot->system_prompt_template,
            'pinecone_namespace' => $bot->pinecone_namespace,
        ]);
    }

    public function index(): JsonResponse
    {
        $bots = $this->botRepository->all();
        return response()->json($bots);
    }
}
3. Missing Type Hints

Violates: "Use descriptive variable and method names" + strict typing

Examples: - BotController::show($slug) → should be show(string $slug): JsonResponse - Model properties should use typed properties (PHP 8.1+)

Fix for Bot model:

<?php

declare(strict_types=1);

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\HasMany;

class Bot extends Model
{
    use HasFactory;

    protected $fillable = [
        'name',
        'slug',
        'role_prompt',
        'system_prompt_template',
        'pinecone_namespace',
    ];

    public const DEFAULT_SYSTEM_PROMPT = "Use the following pieces of retrieved context to answer the question.\n" .
        "If you don't know the answer, just say that you don't know.\n" .
        "Keep responses concise (three sentences max).\n\n" .
        "Context:\n{context}";

    public function getSystemPromptTemplateAttribute(?string $value): string
    {
        return $value ?? self::DEFAULT_SYSTEM_PROMPT;
    }

    public function trainingDocuments(): HasMany
    {
        return $this->hasMany(TrainingDocument::class);
    }
}
4. No Request Validation

Violates: "Use Laravel's validation features for form and request validation"

Issue: No validation on API endpoints

Fix: Create Form Request:

<?php

declare(strict_types=1);

namespace App\Http\Requests;

use Illuminate\Foundation\Http\FormRequest;

class ShowBotRequest extends FormRequest
{
    public function authorize(): bool
    {
        return true; // Or implement auth logic
    }

    public function rules(): array
    {
        return [
            'slug' => ['required', 'string', 'exists:bots,slug'],
        ];
    }
}

Use in controller:

public function show(ShowBotRequest $request, string $slug): JsonResponse
{
    $bot = $this->botRepository->findBySlug($slug);
    // ...
}
5. No API Versioning

Violates: "Implement API versioning for public APIs"

Current: /api/bots/{slug}

Should be: /api/v1/bots/{slug}

Fix in routes/api.php:

Route::prefix('v1')->group(function () {
    Route::get('/bots', [BotController::class, 'index']);
    Route::get('/bots/{slug}', [BotController::class, 'show']);
});
6. No Logging

Violates: "Implement proper error logging and monitoring"

Fix: Add logging to critical operations:

use Illuminate\Support\Facades\Log;

public function show(string $slug): JsonResponse
{
    try {
        $bot = $this->botRepository->findBySlug($slug);
        Log::info("Bot accessed", ['slug' => $slug]);
        return response()->json([...]);
    } catch (\Exception $e) {
        Log::error("Failed to fetch bot", [
            'slug' => $slug,
            'error' => $e->getMessage()
        ]);
        throw $e;
    }
}
7. No CSRF Protection on API

Violates: "Implement proper CSRF protection"

Current: API routes don't use CSRF (standard for APIs)

If needed: Add Sanctum token authentication


B. Python/LangChain Rules Violations

1. Missing Module Docstrings

Violates: "Follow PEP8 with docstrings"

Files: app.py, manage_documents.py

Fix for app.py:

#!/usr/bin/env python3
"""
Lumi Chatbot - Streamlit Application

This module provides the web interface for the Lumi chatbot system.
It integrates with Laravel backend for bot configuration and uses
LangChain + Pinecone for retrieval-augmented generation.

Environment Variables:
    PINECONE_API_KEY: Pinecone authentication key
    OPENAI_API_KEY: OpenAI API key
    PINECONE_INDEX_NAME: Target Pinecone index
    LARAVEL_API_BASE_URL: Laravel backend URL (default: http://localhost:8000/api)
"""

import os
from typing import Dict, List, Any
import streamlit as st
import requests
from dotenv import load_dotenv
# ...rest of imports
2. No Type Hints

Violates: "Use PEP8 and type hints in Python"

Current functions have no type hints

Fix:

from typing import Dict, List, Any, Optional

def fetch_bot_config(bot_slug: str, api_base: str) -> Dict[str, Any]:
    """
    Fetch bot configuration from Laravel API.

    Args:
        bot_slug: Unique identifier for the bot
        api_base: Base URL of the Laravel API

    Returns:
        Dictionary containing bot configuration

    Raises:
        requests.RequestException: If API call fails
        ValueError: If response is invalid
    """
    resp = requests.get(f"{api_base}/bots/{bot_slug}", timeout=5)
    resp.raise_for_status()
    return resp.json()
3. Hardcoded API Keys

Violates: "Never expose secrets; use .env files"

Issue: While using os.environ, there's no .env.example in chatbot directory

Fix: Create lumi-backend/chatbot/.env.example:

# OpenAI Configuration
OPENAI_API_KEY=sk-...

# Pinecone Configuration
PINECONE_API_KEY=pc-...
PINECONE_INDEX_NAME=lumi-chatbot
PINECONE_ENVIRONMENT=us-east-1-aws

# Laravel Backend
LARAVEL_API_BASE_URL=http://localhost:8000/api
4. Not Abstracting LangChain Logic

Violates: Custom rules - "Keep chain creation abstracted in chains.py"

Issue: All LangChain logic in app.py

Fix: Create modular structure:

lumi-backend/chatbot/langchain_logic/chains.py:

"""LangChain chain configurations."""

from typing import List
from langchain_openai import ChatOpenAI
from langchain_core.messages import BaseMessage


def create_chat_llm(model: str = "gpt-4o", temperature: float = 1.0) -> ChatOpenAI:
    """Create and configure ChatOpenAI instance."""
    return ChatOpenAI(
        model=model,
        temperature=temperature,
        cache=None
    )


def generate_response(llm: ChatOpenAI, messages: List[BaseMessage]) -> str:
    """
    Generate response from LLM.

    Args:
        llm: Configured ChatOpenAI instance
        messages: List of conversation messages

    Returns:
        Generated response text
    """
    return llm.invoke(messages).content

lumi-backend/chatbot/langchain_logic/pinecone_client.py:

"""Pinecone vector store client."""

import os
from pinecone import Pinecone
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.vectorstores import VectorStoreRetriever


def initialize_pinecone_store(namespace: str) -> PineconeVectorStore:
    """
    Initialize Pinecone vector store for given namespace.

    Args:
        namespace: Pinecone namespace identifier

    Returns:
        Configured PineconeVectorStore instance
    """
    pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
    index = pc.Index(os.environ["PINECONE_INDEX_NAME"])

    emb = OpenAIEmbeddings(
        model="text-embedding-3-small",
        api_key=os.environ.get("OPENAI_API_KEY")
    )

    return PineconeVectorStore(
        index=index,
        embedding=emb,
        namespace=namespace
    )


def create_retriever(
    vector_store: PineconeVectorStore,
    k: int = 3,
    score_threshold: float = 0.5
) -> VectorStoreRetriever:
    """
    Create retriever from vector store.

    Args:
        vector_store: Pinecone vector store instance
        k: Number of documents to retrieve
        score_threshold: Minimum similarity score

    Returns:
        Configured retriever
    """
    return vector_store.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={"k": k, "score_threshold": score_threshold},
    )

Then simplify app.py:

from langchain_logic.chains import create_chat_llm, generate_response
from langchain_logic.pinecone_client import initialize_pinecone_store, create_retriever
5. No Logging in Python

Violates: "Log all interactions" (from LangChain rules)

Fix:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# In code:
logger.info(f"Bot '{bot_slug}' loaded successfully")
logger.debug(f"Retrieved {len(docs)} documents from Pinecone")
logger.error(f"Failed to fetch bot config: {e}")
6. No Async Methods

Violates: LangChain rules - "Use async methods when available"

Current: Synchronous calls blocking Streamlit

Fix: Use async for API calls and LangChain:

import asyncio
from langchain_openai import ChatOpenAI

async def afetch_bot_config(bot_slug: str) -> Dict:
    """Async fetch bot config."""
    # Use aiohttp instead of requests
    pass

async def agenerate_response(llm: ChatOpenAI, messages: List) -> str:
    """Async generate response."""
    return await llm.ainvoke(messages)

C. Architecture Violations

1. Tight Coupling Between Laravel and Python

Violates: "Keep backend (Laravel), AI logic (LangChain), and UI (Streamlit) modular and independent"

Issue: - Streamlit directly calls Laravel API - No abstraction layer - Hard to test

Fix: Create API client abstraction:

lumi-backend/chatbot/api/laravel_client.py:

"""Laravel API client."""

import os
from typing import Dict, Optional
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


class LaravelAPIClient:
    """Client for interacting with Laravel backend."""

    def __init__(self, base_url: Optional[str] = None):
        """
        Initialize API client.

        Args:
            base_url: Laravel API base URL (default from env)
        """
        self.base_url = base_url or os.environ.get(
            "LARAVEL_API_BASE_URL",
            "http://localhost:8000/api"
        )
        self.session = self._create_session()

    def _create_session(self) -> requests.Session:
        """Create requests session with retry logic."""
        session = requests.Session()
        retry = Retry(
            total=3,
            backoff_factor=0.3,
            status_forcelist=[500, 502, 503, 504]
        )
        adapter = HTTPAdapter(max_retries=retry)
        session.mount("http://", adapter)
        session.mount("https://", adapter)
        return session

    def get_bot(self, slug: str) -> Dict:
        """
        Fetch bot configuration.

        Args:
            slug: Bot slug identifier

        Returns:
            Bot configuration dictionary

        Raises:
            requests.RequestException: If request fails
        """
        url = f"{self.base_url}/bots/{slug}"
        response = self.session.get(url, timeout=5)
        response.raise_for_status()
        return response.json()

Use in app.py:

from api.laravel_client import LaravelAPIClient

api_client = LaravelAPIClient()
bot = api_client.get_bot(bot_slug)
2. No Environment Validation

Violates: "Validate Pinecone init on startup" (Pinecone rules)

Fix: Create startup validation:

def validate_environment() -> None:
    """Validate required environment variables and connections."""
    required_vars = [
        "PINECONE_API_KEY",
        "OPENAI_API_KEY",
        "PINECONE_INDEX_NAME",
    ]

    missing = [var for var in required_vars if not os.getenv(var)]
    if missing:
        raise EnvironmentError(
            f"Missing required environment variables: {', '.join(missing)}"
        )

    # Test Pinecone connection
    try:
        pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
        index = pc.Index(os.environ["PINECONE_INDEX_NAME"])
        index.describe_index_stats()
        logger.info("Pinecone connection validated")
    except Exception as e:
        raise ConnectionError(f"Failed to connect to Pinecone: {e}")

# Call on startup
validate_environment()

Summary of Required Changes by Priority

🔴 CRITICAL (Fix Immediately)

  1. Add system_prompt_template to BotController response - Currently broken
  2. Implement error handling in app.py - Will crash on network errors
  3. Fix hardcoded IP in chatbot-embed.js - Won't work when IP changes
  4. Remove/fix broken /api/bots route - Causes 500 error

🟡 HIGH PRIORITY (Fix Soon)

  1. Add declare(strict_types=1) to all PHP files - PSR-12 compliance
  2. Implement Repository pattern - Laravel best practice
  3. Add type hints to Python code - PEP8 compliance
  4. Create modular LangChain structure - Project rules
  5. Add API versioning - Future-proofing
  6. Add rate limiting - Cost/security

🟢 MEDIUM PRIORITY (Improvement)

  1. Remove unused code (see Section 3)
  2. Add logging - Debugging and monitoring
  3. Implement TrainDocumentJob properly - Automation
  4. Add tests - Quality assurance
  5. Create .env.example files - Documentation

🔵 LOW PRIORITY (Nice to Have)

  1. Containerize with Docker - Deployment
  2. Add Sanctum authentication - Security
  3. Implement async methods - Performance
  4. Add soft deletes - Data safety
  5. Create API documentation - Developer experience

Conclusion

This chatbot system is functionally working but has significant technical debt and architectural issues that should be addressed:

  1. Critical Bug: System prompt template not being sent from API
  2. Security: No authentication, hardcoded IPs, no rate limiting
  3. Code Quality: Missing type hints, no repository pattern, unused code
  4. Architecture: Tight coupling, no modular structure
  5. Deployment: Manual process, no containerization

Recommended Next Steps: 1. Fix critical API bug 2. Implement proper error handling 3. Add type declarations across codebase 4. Refactor to modular architecture 5. Containerize with Docker Compose 6. Add authentication and rate limiting

The codebase shows good potential but needs significant refactoring to meet production quality standards and follow Laravel/Python best practices.


End of Report