CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Build and Run Commands

# Install dependencies
pip install -r requirements.txt

# Run the Streamlit application
streamlit run app.py

# Run tests (if tests/ directory exists)
pytest -q

Architecture Overview

This is a Streamlit-based traffic safety analysis system with a three-layer architecture:

Layer Structure

app.py (Main Entry & UI Orchestration)
    ↓
ui_sections/ (UI Components - render_* functions)
    ↓
services/ (Business Logic)
    ↓
config/settings.py (Configuration)

Data Flow

Input: Excel files uploaded via Streamlit sidebar (事故数据 + 策略数据)
Processing: services/io.py handles loading, column aliasing, and cleaning
Aggregation: Data aggregated to daily time series with aggregate_daily_data()
Analysis: Various services process the aggregated data
Output: Interactive Plotly charts, CSV exports, AI-generated reports

Key Services

Module	Purpose
`services/io.py`	Data loading, column normalization (COLUMN_ALIASES), region inference
`services/forecast.py`	ARIMA grid search, KNN counterfactual, GLM/SVR extrapolation
`services/strategy.py`	Strategy effectiveness evaluation (F1/F2 metrics, safety states)
`services/hotspot.py`	Location extraction, risk scoring, strategy generation
`services/metrics.py`	Model evaluation metrics (RMSE, MAE)

UI Sections

Each tab in the app corresponds to a render_* function in ui_sections/:

render_overview: KPI dashboard and time series visualization
render_forecast: Multi-model prediction comparison
render_model_eval: Model accuracy metrics
render_strategy_eval: Single strategy evaluation
render_hotspot: Accident hotspot analysis with risk levels

Session State Pattern

The app uses st.session_state['processed_data'] to persist:

Loaded DataFrames (combined_city, combined_by_region, accident_records)
Filter state (region_sel, date_range, strat_filter)
Derived metadata (all_regions, all_strategy_types, min_date, max_date)

AI Integration

Uses DeepSeek API (OpenAI-compatible) for generating analysis reports. Configuration in sidebar:

Base URL: https://api.deepseek.com
Model: deepseek-chat
Streaming response rendered incrementally

Coding Conventions

Python 3.8+ with type hints (from __future__ import annotations)
Functions/variables: snake_case; Classes: PascalCase; Constants: UPPER_SNAKE_CASE
Use @st.cache_data for expensive computations
Column aliases defined in COLUMN_ALIASES dict for flexible Excel input
Prefer pandas vectorization over loops

Data Format Requirements

Accident Data Excel must contain (or aliases of):

事故时间 (accident time)
所在街道 (street/region)
事故类型 (accident type: 财损/伤人/亡人)

Strategy Data Excel must contain:

发布时间 (publish date)
交通策略类型 (strategy type)

Configuration (config/settings.py)

Key parameters:

ARIMA_P/D/Q: Grid search ranges for ARIMA
MIN_PRE_DAYS / MAX_PRE_DAYS: Historical data requirements
ANOMALY_CONTAMINATION: Isolation Forest contamination rate

3.3 KiB Raw Blame History