# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Build and Run Commands ```bash # Install dependencies pip install -r requirements.txt # Run the Streamlit application streamlit run app.py # Run tests (if tests/ directory exists) pytest -q ``` ## Architecture Overview This is a Streamlit-based traffic safety analysis system with a three-layer architecture: ### Layer Structure ``` app.py (Main Entry & UI Orchestration) ↓ ui_sections/ (UI Components - render_* functions) ↓ services/ (Business Logic) ↓ config/settings.py (Configuration) ``` ### Data Flow 1. **Input**: Excel files uploaded via Streamlit sidebar (事故数据 + 策略数据) 2. **Processing**: `services/io.py` handles loading, column aliasing, and cleaning 3. **Aggregation**: Data aggregated to daily time series with `aggregate_daily_data()` 4. **Analysis**: Various services process the aggregated data 5. **Output**: Interactive Plotly charts, CSV exports, AI-generated reports ### Key Services | Module | Purpose | |--------|---------| | `services/io.py` | Data loading, column normalization (COLUMN_ALIASES), region inference | | `services/forecast.py` | ARIMA grid search, KNN counterfactual, GLM/SVR extrapolation | | `services/strategy.py` | Strategy effectiveness evaluation (F1/F2 metrics, safety states) | | `services/hotspot.py` | Location extraction, risk scoring, strategy generation | | `services/metrics.py` | Model evaluation metrics (RMSE, MAE) | ### UI Sections Each tab in the app corresponds to a `render_*` function in `ui_sections/`: - `render_overview`: KPI dashboard and time series visualization - `render_forecast`: Multi-model prediction comparison - `render_model_eval`: Model accuracy metrics - `render_strategy_eval`: Single strategy evaluation - `render_hotspot`: Accident hotspot analysis with risk levels ### Session State Pattern The app uses `st.session_state['processed_data']` to persist: - Loaded DataFrames (`combined_city`, `combined_by_region`, `accident_records`) - Filter state (`region_sel`, `date_range`, `strat_filter`) - Derived metadata (`all_regions`, `all_strategy_types`, `min_date`, `max_date`) ### AI Integration Uses DeepSeek API (OpenAI-compatible) for generating analysis reports. Configuration in sidebar: - Base URL: `https://api.deepseek.com` - Model: `deepseek-chat` - Streaming response rendered incrementally ## Coding Conventions - Python 3.8+ with type hints (`from __future__ import annotations`) - Functions/variables: `snake_case`; Classes: `PascalCase`; Constants: `UPPER_SNAKE_CASE` - Use `@st.cache_data` for expensive computations - Column aliases defined in `COLUMN_ALIASES` dict for flexible Excel input - Prefer pandas vectorization over loops ## Data Format Requirements **Accident Data Excel** must contain (or aliases of): - `事故时间` (accident time) - `所在街道` (street/region) - `事故类型` (accident type: 财损/伤人/亡人) **Strategy Data Excel** must contain: - `发布时间` (publish date) - `交通策略类型` (strategy type) ## Configuration (config/settings.py) Key parameters: - `ARIMA_P/D/Q`: Grid search ranges for ARIMA - `MIN_PRE_DAYS` / `MAX_PRE_DAYS`: Historical data requirements - `ANOMALY_CONTAMINATION`: Isolation Forest contamination rate