add: plots

2026-01-17 11:26:04 +08:00
parent cab3bdc14e
commit fb63887ff9
7 changed files with 395 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 latex-template/
 .DS_Store
 ~$*.*
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,114 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Project Overview
 This is a mathematical modeling competition (MCM/ICM) project for the 2021 "Where and When" problem about optimizing mobile food distribution for the Food Bank of the Southern Tier (FBST). The project involves analyzing 2019 visit data from 70 regular MFP sites to design an optimized 2021 visit schedule.
 ## Problem Context
 - **Organization**: FBST serves six counties in New York State
 - **Service**: Mobile Food Pantry (MFP) program delivering nutritious food to underserved communities
 - **Capacity**: 3 trucks available, typically 2 operational per day
 - **Visit specs**: Each visit lasts ~2 hours, serves 200-250 families
 - **2019 data**: 70 regular sites with 722 total visits
 - **Goal**: Optimize 2021 schedule based on 2019 statistics
 ## Tasks (Complete 3/4)
 - **Task 1**: Propose effective & fair schedule for all 70 sites (frequency based on demand)
 - **Task 2 OR 3**: Either (a) reduce optimized site locations, or (b) schedule for longer client travel feasibility; OR implement two-site truck routing
 - **Task 4**: 1-page executive summary
 ## Data Sources
 - `prob/MFP Regular Sites 2019.xlsx` - Main dataset with:
  - Site Name, latitude, longitude
  - Number of Visits in 2019
  - Average Demand per Visit
  - StDev(Demand per Visit)
 - `prob/full_bilingual.md` - Bilingual problem statement (English/Chinese)
 ## Code Structure
 ### Analysis Scripts
 - `analyze_visits.py` - Correlation and regression analysis
  - Reads Excel data using pandas
  - Pearson correlation between average demand and visit frequency
  - Linear regression analysis
  - Standard deviation analysis (coefficient of variation)
  - Multi-variate regression (demand + std_dev)
  - Outputs: `analysis_result.png`
 - `plot_sites.py` - Geographic visualization
  - Reads Excel data and plots site locations
  - Color-codes by visit frequency using viridis colormap
  - Latitude correction for proper aspect ratio
  - Outputs: `sites_map.png`
 ### Dependencies
 ```bash
 # Core libraries
 pandas numpy scipy matplotlib scikit-learn
 # Excel support
 openpyxl xlrd
 ```
 ### Running Analysis
 ```bash
 # Run visit frequency analysis
 python analyze_visits.py
 # Generate geographic map
 python plot_sites.py
 ```
 ## Latex Template
 Location: `latex-template/`
 ```bash
 cd latex-template
 xelatex mcmthesis-demo.tex    # Compile once
 biber mcmthesis-demo          # Run references
 xelatex mcmthesis-demo.tex    # Compile twice more
 ```
 Key template elements:
 - Document class: `mcmthesis` with MCM/ICM formatting
 - Uses APA bibliography style with biblatex/biber
 - Includes abstract, table of contents, sections, appendices
 - "Report on Use of AI" section required per competition rules
 - Problem selection: Set `problem=\textcolor{red}{A}/{B}/{C}/{D}/{E}/{F}` in mcmsetup
 ## Key Modeling Considerations
 From `analyze_visits.py` findings:
 - Low correlation between average demand per visit and total visit frequency
 - Scheduling decisions should consider both mean demand AND demand variability (standard deviation)
 - The coefficient of variation (CV = std_dev/mean) helps identify sites with irregular attendance
 ### Fairness vs Effectiveness Metrics
 - **Effectiveness**: Overall service quality across all clients
 - **Fairness**: Equity in service distribution across different communities
 - Avoid concentration of visits in few high-demand areas
 ### Scheduling Constraints
 - Max 2 trucks per day
 - Each visit ~2 hours
 - Schedule published months in advance
 - No pre-registration required (post-pandemic model)
 ## Development Notes
 - Python scripts use Chinese comments for analysis context
 - Data file paths relative to project root
 - Visualizations saved to root directory
 - Latex compilation requires XeLaTeX (for Chinese support in CTeX)
--- a/analyze_visits.py
+++ b/analyze_visits.py
@@ -0,0 +1,135 @@
 """
 分析：访问总次数是否由每次访问平均需求量决定
 使用相关性分析和回归分析
 """
 import pandas as pd
 import numpy as np
 from scipy import stats
 import matplotlib.pyplot as plt
 # 读取数据
 df = pd.read_excel('prob/MFP Regular Sites 2019.xlsx')
 # 提取关键列
 visits = df['Number of Visits in 2019']
 avg_demand = df['Average Demand per Visit']
 std_demand = df['StDev(Demand per Visit)']
 print("=" * 60)
 print("数据基本统计")
 print("=" * 60)
 print(f"样本数量: {len(visits)}")
 print(f"\n访问总次数:")
 print(f"  均值: {visits.mean():.2f}, 标准差: {visits.std():.2f}")
 print(f"\n每次访问平均需求量:")
 print(f"  均值: {avg_demand.mean():.2f}, 标准差: {avg_demand.std():.2f}")
 # 1. 皮尔逊相关系数分析
 print("\n" + "=" * 60)
 print("1. 皮尔逊相关系数分析")
 print("=" * 60)
 r, p_value = stats.pearsonr(avg_demand, visits)
 print(f"相关系数 r = {r:.4f}")
 print(f"p值 = {p_value:.4e}")
 print(f"决定系数 R² = {r**2:.4f} (可解释{r**2*100:.1f}%的变异)")
 if p_value < 0.05:
    print("结论: p < 0.05, 相关性显著")
 else:
    print("结论: p >= 0.05, 相关性不显著")
 # 2. 线性回归分析
 print("\n" + "=" * 60)
 print("2. 线性回归分析 (访问次数 ~ 平均需求量)")
 print("=" * 60)
 slope, intercept, r_val, p_val, std_err = stats.linregress(avg_demand, visits)
 print(f"回归方程: 访问次数 = {slope:.4f} × 平均需求量 + {intercept:.4f}")
 print(f"斜率标准误: {std_err:.4f}")
 print(f"p值: {p_val:.4e}")
 # 3. 标准差作为辅助分析
 print("\n" + "=" * 60)
 print("3. 标准差辅助分析")
 print("=" * 60)
 # 变异系数 (CV) = 标准差/均值, 衡量相对离散程度
 cv = std_demand / avg_demand
 print(f"变异系数 (CV = 标准差/均值) 统计:")
 print(f"  均值: {cv.mean():.4f}")
 print(f"  范围: {cv.min():.4f} - {cv.max():.4f}")
 # 标准差与访问次数的相关性
 r_std, p_std = stats.pearsonr(std_demand.dropna(), visits[std_demand.notna()])
 print(f"\n标准差与访问次数的相关系数: r = {r_std:.4f}, p = {p_std:.4e}")
 # 4. 多元回归 (平均需求量 + 标准差 -> 访问次数)
 print("\n" + "=" * 60)
 print("4. 多元回归分析 (同时考虑平均需求量和标准差)")
 print("=" * 60)
 from sklearn.linear_model import LinearRegression
 from sklearn.preprocessing import StandardScaler
 # 准备数据 (去除缺失值)
 mask = std_demand.notna()
 X = np.column_stack([avg_demand[mask], std_demand[mask]])
 y = visits[mask]
 model = LinearRegression()
 model.fit(X, y)
 y_pred = model.predict(X)
 ss_res = np.sum((y - y_pred) ** 2)
 ss_tot = np.sum((y - y.mean()) ** 2)
 r2_multi = 1 - ss_res / ss_tot
 print(f"多元 R² = {r2_multi:.4f} (可解释{r2_multi*100:.1f}%的变异)")
 print(f"系数: 平均需求量 = {model.coef_[0]:.4f}, 标准差 = {model.coef_[1]:.4f}")
 print(f"截距: {model.intercept_:.4f}")
 # 5. 总结
 print("\n" + "=" * 60)
 print("综合结论")
 print("=" * 60)
 if abs(r) < 0.3:
    strength = "弱"
 elif abs(r) < 0.7:
    strength = "中等"
 else:
    strength = "强"
 direction = "正" if r > 0 else "负"
 print(f"• 平均需求量与访问次数呈{strength}{direction}相关 (r={r:.3f})")
 print(f"• 平均需求量仅能解释访问次数{r**2*100:.1f}%的变异")
 print(f"• 加入标准差后可解释{r2_multi*100:.1f}%的变异")
 if r**2 < 0.25:
    print("• 结论: 访问总次数主要不由每次访问平均需求量决定")
 else:
    print("• 结论: 每次访问平均需求量对访问总次数有较大影响")
 # 绘图
 fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 # 散点图 + 回归线
 ax1 = axes[0]
 ax1.scatter(avg_demand, visits, alpha=0.6, edgecolors='black', linewidth=0.5)
 x_line = np.linspace(avg_demand.min(), avg_demand.max(), 100)
 y_line = slope * x_line + intercept
 ax1.plot(x_line, y_line, 'r-', linewidth=2, label=f'回归线 (R²={r**2:.3f})')
 ax1.set_xlabel('Average Demand per Visit (每次访问平均需求量)')
 ax1.set_ylabel('Number of Visits (访问总次数)')
 ax1.set_title('访问次数 vs 平均需求量')
 ax1.legend()
 ax1.grid(True, alpha=0.3)
 # 残差图
 ax2 = axes[1]
 residuals = visits - (slope * avg_demand + intercept)
 ax2.scatter(avg_demand, residuals, alpha=0.6, edgecolors='black', linewidth=0.5)
 ax2.axhline(y=0, color='r', linestyle='--', linewidth=2)
 ax2.set_xlabel('Average Demand per Visit (每次访问平均需求量)')
 ax2.set_ylabel('Residuals (残差)')
 ax2.set_title('残差分析')
 ax2.grid(True, alpha=0.3)
 plt.tight_layout()
 plt.savefig('analysis_result.png', dpi=150, bbox_inches='tight')
 print("\n图表已保存至 analysis_result.png")
--- a/plot_sites.py
+++ b/plot_sites.py
@@ -0,0 +1,57 @@
 import pandas as pd
 import matplotlib.pyplot as plt
 import numpy as np
 # 读取数据
 df = pd.read_excel('MFP Regular Sites 2019.xlsx')
 # 计算中心纬度，用于校正经纬度比例
 lat_center = df['latitude'].mean()
 # 经度在该纬度下的校正系数
 aspect_ratio = 1 / np.cos(np.radians(lat_center))
 # 创建图形
 fig, ax = plt.subplots(figsize=(14, 10))
 # 绘制散点图
 scatter = ax.scatter(df['longitude'], df['latitude'],
                     c=df['Number of Visits in 2019'],
                     cmap='viridis',
                     s=100,
                     alpha=0.7,
                     edgecolors='black',
                     linewidth=0.5)
 # 设置等比例坐标轴（考虑纬度校正）
 ax.set_aspect(aspect_ratio)
 # 添加颜色条
 cbar = plt.colorbar(scatter, ax=ax)
 cbar.set_label('Number of Visits in 2019', fontsize=10)
 # 添加站点标签
 for idx, row in df.iterrows():
    ax.annotate(row['Site Name'].replace('MFP ', ''),
                (row['longitude'], row['latitude']),
                fontsize=6,
                alpha=0.7,
                xytext=(3, 3),
                textcoords='offset points')
 # 设置标签和标题
 ax.set_xlabel('Longitude', fontsize=12)
 ax.set_ylabel('Latitude', fontsize=12)
 ax.set_title('MFP Regular Sites 2019 - Geographic Distribution', fontsize=14)
 # 添加网格
 ax.grid(True, alpha=0.3)
 # 调整布局
 plt.tight_layout()
 # 保存图片
 plt.savefig('sites_map.png', dpi=150, bbox_inches='tight')
 print('图片已保存为 sites_map.png')
 # 显示图形
 plt.show()
--- a/2019.xlsx
+++ b/2019.xlsx
--- a/prob/full_bilingual.md
+++ b/prob/full_bilingual.md
@@ -0,0 +1,88 @@
 # The "Where" and "When" of food distribution.
 # 食物分发的"地点"与"时间"
 ---
 Feeding America is a network of organizations striving to provide food security for people with limited financial resources. The Food Bank of the Southern Tier (FBST) is one of these organizations, serving six counties and nearly 4,000 square miles in New York State.
 Feeding America（供养美国）是一个致力于为经济资源有限的人群提供食品安全保障的组织网络。南部地区食品银行（FBST）是其中的一个组织，服务于纽约州的六个县，覆盖近4,000平方英里。
 ---
 In normal years, the Mobile Food Pantry (MFP) program is among the main activities of FBST. The goal is to make nutritious and healthy food more accessible to people in underserved communities. Even in areas where other agencies provide assistance, clients may not always have access to food because of limited transportation options or because those agencies are only open certain hours or days per week.
 在正常年份，移动食品储藏室（MFP）项目是FBST的主要活动之一。其目标是让服务不足社区的人们更容易获得营养健康的食物。即使在有其他机构提供援助的地区，客户也可能因交通选择有限，或因这些机构只在特定时间或每周特定几天开放而无法获得食物。
 ---
 An MFP provides food directly to clients in pre-packed boxes or through a farmers market-style distribution. When an MFP truck arrives at a site, volunteers lay out the food on tables surrounding it. The clients can then "shop", choosing items that they need. Each truck can transport up to 15,000 pounds. Three MFP trucks are available, and FBST usually has enough food and volunteers to operate 2 of them on any given day. A typical mobile pantry visit lasts two hours and provides 200 to 250 families with nutritious food to help them make ends meet. The schedule of all MFP site visits is published months in advance, to help clients plan accordingly.
 MFP通过预装箱或农贸市场式分发的方式直接向客户提供食物。当MFP卡车到达某个站点时，志愿者会在卡车周围的桌子上摆放食物。然后客户可以"购物"，选择他们需要的物品。每辆卡车可运输最多15,000磅食物。共有三辆MFP卡车可用，FBST通常在任何给定的一天都有足够的食物和志愿者来运营其中2辆。一次典型的移动储藏室访问持续两小时，可为200至250个家庭提供营养食物，帮助他们维持生计。所有MFP站点访问的时间表会提前数月公布，以帮助客户相应规划。
 ---
 Unfortunately, the COVID-19 pandemic forced FBST to modify its services. The current version of the MFP program is very limited in scope & far less flexible: offering monthly visits to only 9 major sites, requiring all clients to register in advance for each pick-up, and using socially-distanced pick-up options. FBST plans to return to a pre-pandemic level of MFP services next year and will allow clients to simply show up when a truck visits without having to sign up in advance. Your team's goal is to help them optimize the schedule for 2021 using the statistical data from 2019. In that year, MFP has serviced 70 regular sites, with 722 visits across all of them. The map (with site locations & demand serviced at each of them) is included below.
 不幸的是，COVID-19疫情迫使FBST调整了其服务。当前版本的MFP项目范围非常有限且灵活性大大降低：仅向9个主要站点提供每月访问，要求所有客户提前为每次取货进行登记，并使用保持社交距离的取货方式。FBST计划明年恢复到疫情前的MFP服务水平，并允许客户在卡车访问时直接到场，无需提前登记。你们团队的目标是利用2019年的统计数据帮助他们优化2021年的时间表。在那一年，MFP服务了70个常规站点，所有站点共计722次访问。下方包含了地图（显示站点位置及每个站点服务的需求）。
 ---
 We ask you to complete only three of the following 4 tasks: 1 + either 2 or 3 + 4.
 我们要求你只完成以下4项任务中的3项：1 + 2或3 + 4。
 ---
 ## Task 1 / 任务 1
 **Propose an effective & fair schedule for visiting all of these 70 regular sites in 2021.** The frequency of visits should be informed by the total demand in surrounding communities. Effectiveness: how well do you serve all clients on average? Fairness: are some of them served much better than others?
 **为2021年访问所有70个常规站点提出一个有效且公平的时间表。** 访问频率应根据周边社区的总需求来确定。有效性：平均而言，你能多好地服务所有客户？公平性：是否有些客户得到的服务比其他人好得多？
 ---
 ## Task 2 / 任务 2
 Based on historical data, on particularly cold winter days, the number of clients attending an MFP visit can be much lower than usual. But when the weather improves, the demand often spikes for the next visit to any site nearby. This suggests that some of the clients (those who own cars or have better public transportation options) are willing to come to pick-up sites that are a little farther. Of course, this only works if the timing of MFP visits to those nearby sites is suitable.
 根据历史数据，在特别寒冷的冬日，参加MFP访问的客户数量可能远低于平时。但当天气好转时，附近任何站点的下一次访问需求往往会激增。这表明一些客户（那些有车或有更好公共交通选择的人）愿意去稍远一点的取货站点。当然，这只有在附近站点的MFP访问时间合适的情况下才有效。
 You can try to incorporate this information in two very different ways:
 你可以尝试用两种非常不同的方式利用这些信息：
 (a) Reduce the total number of serviced sites (and optimize their location), hoping that people will still drive or take a bus to them.
 (a) 减少服务站点的总数（并优化其位置），希望人们仍会开车或乘公交前往。
 (b) Keep the number & location of sites the same, but schedule the MFP visits in a way that makes such longer trips by clients feasible (even if less desirable).
 (b) 保持站点数量和位置不变，但以一种使客户进行较长距离出行可行（即使不太理想）的方式安排MFP访问。
 Modify your previous scheduling approach to incorporate one of the above two options. Quantify the resulting improvements in performance.
 修改你之前的调度方法以纳入上述两个选项之一。量化由此带来的性能改进。
 ---
 ## Task 3 / 任务 3
 To optimize the volunteers' efforts, the FBST is also considering a new option of sending the same truck to visit two different sites on some of the trips. Along with the challenge of selecting the sites and the date, this would also make it necessary to decide on the amount of food to dispense at the first site (since without pre-registration the demand at the next site is not known for sure). Suggest an algorithm to address this, characterizing both the effectiveness & fairness of the resulting distribution.
 为了优化志愿者的工作，FBST还在考虑一个新选项：在某些行程中派同一辆卡车访问两个不同的站点。除了选择站点和日期的挑战外，这还需要决定在第一个站点分发多少食物（因为在没有预先登记的情况下，下一个站点的需求是不确定的）。建议一种算法来解决这个问题，描述所得分配的有效性和公平性。
 ---
 ## Task 4 / 任务 4
 Along with your technical manuscript, please submit a 1-page executive summary, describing the key advantages and potential drawbacks of your recommendations.
 除了技术手稿外，请提交一份1页的执行摘要，描述你建议的主要优势和潜在缺点。
 ---
 ![](images/05746b4a5d7cf8837e3088440c699644fc5787073ab2c767d81af5c786fca25b.jpg)
 Location of 70 MFP distribution sites across the six counties of New York State regularly serviced by FBST in 2019. (The above omits 12 more sites, which were serviced only occasionally – less than 5 times each in 2019.) The color of disks indicates the average demand (the number of clients serviced per visit) for each of the sites. Most of these sites are unfortunately not serviced in 2020. Graphical information summary by Dr. F. Alkaabneh. A spreadsheet version of this dataset is also available here.
 2019年FBST在纽约州六个县定期服务的70个MFP分发站点位置。（上图省略了另外12个站点，这些站点仅偶尔提供服务——2019年每个站点不到5次。）圆盘的颜色表示每个站点的平均需求（每次访问服务的客户数量）。不幸的是，这些站点中的大多数在2020年没有提供服务。图形信息摘要由F. Alkaabneh博士提供。此数据集的电子表格版本也可在此处获取。
--- a/prob/m_cn.pdf
+++ b/prob/m_cn.pdf