update sa

2026-01-28 15:50:08 +08:00
parent a1d41b049c
commit 47e5f68759
4 changed files with 111 additions and 123 deletions
--- a/latex/mcmthesis-demo.tex
+++ b/latex/mcmthesis-demo.tex
@@ -101,6 +101,7 @@
 % 定义备忘录专用的颜色
 \definecolor{fbstblue}{RGB}{0, 51, 102}
 \definecolor{fbstgold}{RGB}{218, 165, 32}
+\setlength{\headheight}{14pt} % avoid fancyhdr headheight warnings
 \setlength{\cftbeforesecskip}{6pt}
 \renewcommand{\contentsname}{\hspace*{\fill}\Large\bfseries Contents \hspace*{\fill}}

@@ -172,7 +173,7 @@ In contrast, Task 3 focuses on the optimization of volunteer efficiency by frami
 \centering
 \includegraphics[width=12cm]{flow.png}
 \caption{the flow chart of our work} 
-\label{Site Distribution Map}
+\label{fig:workflow_flow}
 \end{figure}

 \section{Assumptions and Justifications}
@@ -732,36 +733,63 @@ Fig. \ref{Fairness diagnostics} presents the fairness diagnostics for 70 service

 \section{Sensitivity Analysis}

+\subsection{Purpose, Decision Rule, and Tested Ranges}
+Sensitivity analysis is used to verify that the key conclusions---improved effectiveness under feasibility constraints and controlled risk---remain stable under reasonable perturbations of a small set of threshold parameters. Throughout this section, we interpret parameter choices using the following decision rule: we prefer settings that (i) maintain feasibility and the equity floor enforced by the scheduling constraints (e.g., minimum service frequency), (ii) keep shortfall risk below an acceptable level (e.g., $R1 \le 5\%$), and (iii) maximize distribution effectiveness, prioritizing quality-weighted service (E2) when trade-offs exist.
+
+\begin{table}[H]
+    \footnotesize
+    \centering
+    \caption{Key parameters and tested ranges in sensitivity analysis}
+    \label{tab:sensitivity_params}
+    \begin{tabular}{@{}llllp{4.2cm}@{}}
+      \toprule[1pt]
+      Parameter & Meaning & Baseline & Tested range & Rationale for range \\
+      \midrule[0.6pt]
+      $C_0$ & Capacity threshold for correction (Task 1) & 350 & 350--450 & Around observed ``capacity ceiling'' \\
+      $p^{trunc}_0$ & Truncation probability threshold (Task 1) & 0.1 & 0.01--0.10 & From conservative to permissive correction \\
+      $\hat c$ & Quality threshold used in E2 (Task 1) & 250 & 200--300 & Covers typical quality cutoffs \\
+      Merge ratio & Portion merged into dual-stop routes (Task 3) & 0.5 & 0.1--0.9 & From light to aggressive merging \\
+      $l_{max}$ & Maximum pairing distance (Task 3) & 50 & 10--100 & Feasible local pairing to relaxed pairing \\
+      $\mu_{\text{sum}}$ & Cap on paired expected demand (Task 3) & 450 & 350--550 & Controls combined-stop overload risk \\
+      CV threshold & Maximum volatility admitted for pairing (Task 3) & 0.5 & 0.1--1.0 & Filters unstable sites vs. broader admission \\
+      \bottomrule[1pt]
+    \end{tabular}
+  \end{table}
+
 \subsection{Sensitivity Analysis for Task 1}

 \begin{figure}[H] 
 \centering
 \includegraphics[width=11cm]{task1.png}
 \caption{Sensitivity Analysis for Task 1} 
+\label{fig:sens_task1}

 \end{figure}
-To verify the robustness and reliability of the model parameters, we perform a sensitivity analysis on the critical parameters in Task 1: the capacity threshold $C_0$ and the truncation probability threshold $p_0^{trunc}$. By testing twenty different parameter combinations, we evaluate the performance of various evaluation metrics and their relative deviations from the baseline values provided in the paper.
+To verify robustness of the Task 1 correction-and-allocation pipeline, we conduct a one-at-a-time sensitivity analysis around the baseline setting $(C_0,\,p^{trunc}_0,\hat c)=(350,\,0.1,\,250)$. In total we run 20 configurations by varying one threshold while holding the others fixed, and record the changes of key effectiveness indicators (E1/E2) and the number of corrected sites.

-% 待插图
-The results indicate that the overall fluctuations of all metrics remain stable. Specifically, $C_0$ is identified as the primary factor influencing the results; its variation significantly affects how many sites are categorized as high-demand sites requiring correction. We observe that fluctuations in evaluation metrics caused by different $p_0^{trunc}$ values (under the same $C_0$) are substantially smaller than those caused by changes in $C_0$ itself. Furthermore, the magnitude of these changes remains relatively low, demonstrating that parameter variations do not alter the core model philosophy of "demand-based allocation, fairness, and feasibility." Consequently, our selection of $C_0 = 350$ and $p_0^{trunc} = 0.1$ is justified as both reasonable and robust.
+The results show that $C_0$ is the dominant driver of outcome variability: increasing the effective capacity threshold reduces E1 and simultaneously decreases the number of sites flagged for correction (Fig.~\ref{fig:sens_task1}(a--b)), indicating that the model behavior is consistent with its mechanism---a stricter ``capacity ceiling'' identifies more truncated-demand sites and unlocks more corrected service volume. By contrast, perturbing the truncation probability threshold $p^{trunc}$ causes much smaller changes in E1 (Fig.~\ref{fig:sens_task1}(c)), suggesting the correction rule is not overly sensitive to this hyper-parameter in the tested range. Finally, since the quality-weighted metric E2 depends on the quality threshold $\hat c$, Fig.~\ref{fig:sens_task1}(d) reports its monotone effect on E2 and confirms that the chosen baseline $\hat c=250$ lies in a reasonable, stable regime. Overall, these patterns support that our baseline parameter choices are reliable and do not change the core model philosophy of ``demand-based allocation with equity protection''.
+In practice, we recommend keeping $C_0$ near the lower end of the tested range (around 350) to avoid missing truncated-demand sites, while $p^{trunc}_0$ can vary within the tested interval without materially changing E1; $\hat c$ should be selected based on the program's definition of ``acceptable quality,'' with $\hat c=250$ providing a balanced baseline in Fig.~\ref{fig:sens_task1}(d).

 \subsection{Sensitivity Analysis for Task 3}
-To further validate the reliability of the dual-site collaborative scheduling model, we conduct a sensitivity analysis on the key parameters of Task 3. Performance metrics and their respective deviations were recorded across multiple parameter combinations.
+To validate the reliability of the dual-site collaborative scheduling model, we perform sensitivity analysis on the key pairing/merging thresholds used in Task 3. We report how the effectiveness metrics (E1/E2) and the shortfall risk (R1) respond to changes in each threshold, holding the others at the baseline.

 \begin{figure}[H] 
 \centering
 \includegraphics[width=11cm]{task3.png}
 \caption{Sensitivity Analysis for Task 3} 
+\label{fig:sens_task3}

 \end{figure}
 Our analysis yields the following observations:
 \begin{itemize}
-    \item \textbf{Distance Threshold ($l_{max}$):} As the distance threshold $l_{max}$ increases from 10 to approximately 22 miles, the number of possible symbiotic site pairs increases from 27 to 34. Beyond this point, further loosening the distance constraint results in negligible marginal gains in the total service volume. To ensure a sufficient number of pairs for operational flexibility, we set $l_{max} = 50$.
-    \item \textbf{Demand Threshold ($d_{max}$):} Variations in the demand threshold have a minimal impact on the total service volume, with a fluctuation of only approximately $2\%$. However, increasing this value significantly elevates the \textit{Service Gap Risk}. To prioritize distribution effectiveness and maximize food utilization, we adopt $d_{max} = 450$, accepting a higher localized risk to ensure no capacity is wasted.
-    \item \textbf{Volatility Threshold ($CV_{min}$):} At lower threshold values, the $CV_{min}$ (Coefficient of Variation) shows a distinct negative correlation with the total service volume. As the threshold increases, the system enters a plateau phase and reaches its peak around $0.56$. To balance the maximization of total service volume with the mitigation of service gap risks, we select a volatility threshold of $0.5$.
+    \item \textbf{Merge Ratio:} A higher merge ratio increases both E1 and E2 by consolidating more workload into dual-stop routes, while R1 remains nearly flat around the baseline. We adopt a moderate baseline merge ratio of $0.5$ (vertical reference line) because it sits near the onset of diminishing returns: further increases yield smaller marginal gains but rely on more aggressive merging behavior.
+    \item \textbf{Distance Threshold ($l_{max}$):} As $l_{max}$ increases, the feasible pairing set expands and E1/E2 rise quickly at first, then enter a clear plateau with diminishing returns; meanwhile, R1 exhibits higher volatility at very small $l_{max}$ and stabilizes after moderate relaxation. To maintain operational feasibility while ensuring enough pairing options, we set $l_{max}=50$ (reference line), which lies well into the plateau region where additional relaxation brings negligible effectiveness gains.
+    \item \textbf{Capacity Cap ($\mu_{\text{sum}}$):} Tightening the cap (smaller $\mu_{\text{sum}}$) restricts which pairs can be formed and reduces E1/E2, while loosening it admits higher-demand combined stops but can increase shortfall risk. We use $\mu_{\text{sum}}=450$ as a balanced baseline (reference line) because it achieves near-maximum effectiveness without moving further into the higher-risk regime implied by looser caps.
+    \item \textbf{Volatility Threshold (CV threshold):} Increasing the CV threshold initially improves E1/E2 as more complementary pairs are admitted, then quickly reaches a plateau; at the same time, R1 rises and stabilizes. We select a baseline CV threshold of $0.5$ (reference line) to sit at the start of the effectiveness plateau, where marginal gains beyond 0.5 are small compared with the associated risk exposure.
 \end{itemize}

 In summary, the parameter selections in our model ensure both high total service volume and superior food distribution effectiveness, demonstrating that the model parameters are highly reliable and robust under varying conditions.
+Notably, multiple parameters exhibit clear ``knee'' or plateau behavior in Fig.~\ref{fig:sens_task3}; hence, our baselines are chosen to lie near the start of the plateau region (high marginal effectiveness) while avoiding unnecessarily large risk exposure.


 \section{Strengths and Weaknesses}