saketh1201 commited on
Commit
ab7c428
·
verified ·
1 Parent(s): 2923e9c

Upload folder using huggingface_hub

Browse files
Files changed (14) hide show
  1. Dockerfile +29 -0
  2. README.md +149 -10
  3. __init__.py +0 -0
  4. client.py +67 -0
  5. curl +0 -0
  6. inference.py +228 -0
  7. models.py +29 -0
  8. openenv.yaml +6 -0
  9. pyproject.toml +18 -0
  10. server/__init__.py +0 -0
  11. server/app.py +14 -0
  12. server/constants.py +178 -0
  13. server/grader.py +135 -0
  14. server/inventory_env.py +246 -0
Dockerfile ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ghcr.io/meta-pytorch/openenv-base:latest AS builder
2
+
3
+ RUN apt-get update && apt-get install -y git curl && \
4
+ curl -LsSf https://astral.sh/uv/install.sh | sh
5
+ ENV PATH="/root/.local/bin:$PATH"
6
+
7
+ WORKDIR /app
8
+ COPY pyproject.toml uv.lock* ./
9
+ RUN uv sync --frozen || uv sync
10
+ COPY . .
11
+ RUN uv sync
12
+
13
+ FROM ghcr.io/meta-pytorch/openenv-base:latest
14
+
15
+ WORKDIR /app
16
+ COPY --from=builder /app/.venv /app/.venv
17
+ COPY --from=builder /app /app
18
+
19
+ ENV PATH="/app/.venv/bin:$PATH"
20
+ ENV PYTHONUNBUFFERED=1
21
+ ENV PYTHONPATH="/app:$PYTHONPATH"
22
+
23
+ EXPOSE 8000
24
+
25
+ HEALTHCHECK --interval=30s --timeout=3s \
26
+ CMD curl -f http://localhost:8000/health || exit 1
27
+
28
+ ENV ENABLE_WEB_INTERFACE=true
29
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md CHANGED
@@ -1,10 +1,149 @@
1
- ---
2
- title: Inventory Env
3
- emoji: 🔥
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Inventory Optimization Environment
3
+ emoji: 📦
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ app_port: 8000
8
+ tags:
9
+ - openenv
10
+ base_path: /web
11
+ ---
12
+
13
+ # Retail Inventory Optimization Environment
14
+
15
+ An OpenEnv reinforcement learning environment that simulates day-by-day retail inventory management across 5 product categories. An AI agent must decide what to buy, how to ship, and what to liquidate to maximize profit over a 30-day episode.
16
+
17
+ ## Environment Description
18
+
19
+ You manage a retail store selling 5 products with different characteristics:
20
+
21
+ | Product | Sell Price | Cost Price | Profit Margin | Shelf Life |
22
+ |---------|-----------|------------|---------------|------------|
23
+ | Electronics | $150 | $100 | $50 | No expiry |
24
+ | Clothing | $40 | $25 | $15 | No expiry |
25
+ | Groceries | $10 | $5 | $5 | 5 days |
26
+ | Furniture | $200 | $130 | $70 | No expiry |
27
+ | Toys | $25 | $12 | $13 | No expiry |
28
+
29
+ Each day, customer demand is generated (with weekend boosts and event spikes). The agent must keep stock levels high enough to meet demand while managing cash flow, shipping delays, warehouse capacity, and perishable goods.
30
+
31
+ ## Action Space
32
+
33
+ ```python
34
+ class InventoryAction(Action):
35
+ buy_quantities: Dict[str, int] = {} # product -> quantity to order
36
+ delivery_method: Literal["slow", "medium", "fast"] = "slow"
37
+ liquidate: Dict[str, int] = {} # product -> quantity to dispose
38
+ ```
39
+
40
+ | Field | Description |
41
+ |-------|-------------|
42
+ | `buy_quantities` | Products and amounts to order. Empty `{}` to skip buying. |
43
+ | `delivery_method` | `"slow"` ($2/unit, 5 days), `"medium"` ($5/unit, 3 days), `"fast"` ($10/unit, 1 day) |
44
+ | `liquidate` | Products and amounts to dispose of (no revenue). Use for expiring groceries or freeing warehouse space. |
45
+
46
+ ## Observation Space
47
+
48
+ ```python
49
+ class InventoryObservation(Observation):
50
+ current_day: int
51
+ total_cash: float
52
+ day_profit: float
53
+ total_profit: float
54
+ demand_today: Dict[str, int]
55
+ updated_inventory: Dict[str, List[List[Optional[int]]]] # [[qty, days_left], ...]
56
+ remaining_capacity: Dict[str, int]
57
+ updated_events: Dict[str, int]
58
+ updated_deliveries: List[Dict[str, List[int]]]
59
+ ```
60
+
61
+ The inventory uses a batch format with FIFO selling: `{"groceries": [[20, 3], [10, 5]]}` means 20 units expiring in 3 days and 10 units expiring in 5 days.
62
+
63
+ ## Tasks (Easy / Medium / Hard)
64
+
65
+ ### Easy — "Steady State"
66
+ - Low starting stock, low steady demand, no events
67
+ - Starting cash: $1,000 | Full warehouse capacity
68
+ - Agent needs to restock regularly but demand is predictable
69
+
70
+ ### Medium — "Seasonal Rush"
71
+ - Default stock/cash, all 5 events spread across 30 days
72
+ - Events: Black Friday (day 6), Christmas (day 12), Back to School (day 18), Summer Clearance (day 24), New Competitor (day 28)
73
+ - Agent must anticipate demand spikes and restock accordingly
74
+
75
+ ### Hard — "Chaos Mode"
76
+ - Half starting cash ($500), low stock, events packed close together
77
+ - Higher demand, smaller warehouse capacity
78
+ - Agent must balance tight budget, overlapping event spikes, and fast-expiring groceries
79
+
80
+ ## Reward Function
81
+
82
+ Per-step reward based on multiple signals:
83
+ - **Successful sales**: `+sold_units * sell_price * 0.001` (proportional to revenue)
84
+ - **Missed sales**: `-missed_units * sell_price * 0.001` (proportional to lost revenue)
85
+ - **Expired groceries**: `-0.05 * expired_count`
86
+ - **Failed purchases**: `-0.5` per order that exceeds available cash
87
+ - **Liquidation loss**: `-liquidated_value * 0.001` (proportional to cost of disposed stock)
88
+
89
+ ## Grading (0.0 - 1.0)
90
+
91
+ Each task is scored by comparing agent profit against two baselines:
92
+ - **Floor**: Passive agent that never buys (sells initial stock until empty)
93
+ - **Ceiling**: Smart heuristic that restocks based on demand and events
94
+
95
+ ```
96
+ score = clamp((agent_profit - floor) / (ceiling - floor), 0.0, 1.0)
97
+ ```
98
+
99
+ ## Setup
100
+
101
+ ```bash
102
+ # Install dependencies
103
+ pip install openenv-core[core] fastapi uvicorn pydantic openai numpy
104
+
105
+ # Run grader baselines
106
+ python -c "from server.grader import compute_baselines; [print(f'{t}: floor={f:.2f}, ceiling={c:.2f}') for t in ['easy','medium','hard'] for f,c in [compute_baselines(t)]]"
107
+
108
+ # Start server locally
109
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
110
+
111
+ # Test endpoints
112
+ curl http://localhost:8000/health
113
+ curl -X POST http://localhost:8000/reset
114
+ ```
115
+
116
+ ## Running Inference
117
+
118
+ ```bash
119
+ export API_BASE_URL="https://router.huggingface.co/v1"
120
+ export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
121
+ export HF_TOKEN="your-token"
122
+ python inference.py
123
+ ```
124
+
125
+ ## Docker
126
+
127
+ ```bash
128
+ docker build -t inventory-env .
129
+ docker run -p 8000:8000 inventory-env
130
+ ```
131
+
132
+ ## Project Structure
133
+
134
+ ```
135
+ V2/
136
+ ├── models.py # InventoryAction, InventoryObservation, InventoryState
137
+ ├── client.py # EnvClient for remote WebSocket connections
138
+ ├── inference.py # LLM inference script (runs all 3 tasks)
139
+ ├── openenv.yaml # OpenEnv spec manifest
140
+ ├── pyproject.toml # Python dependencies
141
+ ├── Dockerfile # Container build
142
+ ├── server/
143
+ │ ├── app.py # FastAPI server (create_app)
144
+ │ ├── inventory_env.py # Environment (reset, step, state)
145
+ │ ├── constants.py # Prices, stock, events, task configs
146
+ │ └── grader.py # Floor/ceiling baselines and scoring
147
+ └── scripts/
148
+ └── validate-submission.sh # Pre-submission validator
149
+ ```
__init__.py ADDED
File without changes
client.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Any, Dict
4
+
5
+ from openenv.core.client_types import StepResult
6
+ from openenv.core.env_client import EnvClient
7
+
8
+ from models import InventoryAction, InventoryObservation, InventoryState
9
+
10
+
11
+
12
+ class InventoryEnv(EnvClient[InventoryAction, InventoryObservation, InventoryState]):
13
+
14
+
15
+ def _step_payload(self, action : InventoryAction) -> Dict[str, Any]:
16
+
17
+ payload: Dict[str, Any] = {}
18
+
19
+ if action.buy_quantities is not None:
20
+ payload["buy_quantities"] = action.buy_quantities
21
+
22
+ if action.delivery_method is not None:
23
+ payload["delivery_method"] = action.delivery_method
24
+
25
+ if action.upgrade_delivery is not None:
26
+ payload["upgrade_delivery"] = action.upgrade_delivery
27
+
28
+ if action.liquidate is not None:
29
+ payload["liquidate"] = action.liquidate
30
+
31
+ return payload
32
+
33
+
34
+ def _parse_result(self, payload: Dict) -> StepResult[InventoryObservation]:
35
+
36
+ obs_data = payload.get("observation", {})
37
+
38
+ observation = InventoryObservation(
39
+
40
+ current_day = obs_data.get("current_day", 0),
41
+ total_cash = obs_data.get("total_cash", 0),
42
+ day_profit = obs_data.get("day_profit", 0),
43
+ total_profit = obs_data.get("total_profit", 0),
44
+ demand_today = obs_data.get("demand_today", {}),
45
+ updated_inventory = obs_data.get("updated_inventory", {}),
46
+ updated_events = obs_data.get("updated_events", {}),
47
+ updated_deliveries = obs_data.get("updated_deliveries", []),
48
+ done = obs_data.get("done", False),
49
+ reward = obs_data.get("reward", 0.0),
50
+ metadata=obs_data.get("metadata", {}),
51
+ )
52
+
53
+ return StepResult(
54
+ observation = observation,
55
+ reward = observation.reward,
56
+ done = observation.done,
57
+ )
58
+
59
+
60
+ def _parse_state(self, payload: Dict[str, Any]) -> InventoryState:
61
+
62
+ return InventoryState(
63
+ episode_id = payload.get("episode_id", ""),
64
+ current_day = payload.get("current_day", 0),
65
+ cash = payload.get("cash", 0.0),
66
+ inventory = payload.get("inventory", {}),
67
+ )
curl ADDED
File without changes
inference.py ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Inference Script — Inventory Optimization Environment
3
+ =======================================================
4
+ Required env vars:
5
+ API_BASE_URL The API endpoint for the LLM.
6
+ MODEL_NAME The model identifier to use for inference.
7
+ HF_TOKEN Your Hugging Face / API key.
8
+ """
9
+
10
+ import os
11
+ import json
12
+ import textwrap
13
+
14
+ from openai import OpenAI
15
+
16
+ from server.inventory_env import InventoryEnvironment
17
+ from server.constants import EXTRA_INVENTORY_COST
18
+ from models import InventoryAction
19
+
20
+ from dotenv import load_dotenv
21
+ load_dotenv()
22
+
23
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
24
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
25
+ MODEL_NAME = os.getenv("MODEL_NAME")
26
+ MAX_DAYS = 30
27
+
28
+ SYSTEM_PROMPT = textwrap.dedent("""
29
+ You are an inventory management AI agent. Each day you receive the current state
30
+ of a retail store with 5 products: electronics, clothing, groceries, furniture, toys.
31
+
32
+ Groceries are perishable (5-day shelf life). Other products don't expire.
33
+
34
+ Product selling prices: electronics=$150, clothing=$40, groceries=$10, furniture=$200, toys=$25
35
+ Product cost prices: electronics=$100, clothing=$25, groceries=$5, furniture=$130, toys=$12
36
+ Profit margins: electronics=$50, clothing=$15, groceries=$5, furniture=$70, toys=$13
37
+ Shipping costs per unit: slow=$2 (5 days), medium=$5 (3 days), fast=$10 (1 day)
38
+ Warehouse capacity: electronics=100, clothing=200, groceries=500, furniture=50, toys=300
39
+
40
+ Events (like black_friday, christmas) boost demand when their countdown hits 0.
41
+ Weekends (day%7 == 5 or 6) have 1.2x demand.
42
+
43
+ CRITICAL STRATEGY:
44
+ - You MUST restock products when inventory is low. If you don't buy, you run out of
45
+ stock and miss sales. Missed sales = lost revenue = negative reward.
46
+ - Check today's demand to estimate tomorrow's needs.
47
+ - Do NOT overbuy when demand is low - unsold stock ties up cash, warehouse space and perishables expire.
48
+ - Prioritize high-margin products: furniture ($70 profit), electronics ($50 profit).
49
+ - Stock up BEFORE events hit (check event countdowns).
50
+
51
+ Each day you must respond with a JSON action:
52
+ {
53
+ "buy_quantities": {"product_name": quantity, ...},
54
+ "delivery_method": "slow" | "medium" | "fast",
55
+ "liquidate": {"product_name": quantity, ...}
56
+ }
57
+
58
+ - buy_quantities: products and amounts to order.
59
+ - delivery_method: shipping speed for this order
60
+ - liquidate: products and amounts to dispose of (no revenue, empty {} to skip)
61
+ Use liquidate to free up warehouse space before a restock.
62
+
63
+ You will see what demand occurred today AFTER it happened. Use this to spot trends
64
+ and plan restocking. A negative reward means your last action was bad — adjust.
65
+
66
+ Do NOT buy more than you can afford. Do NOT buy on the last day.
67
+ Respond with ONLY valid JSON, no explanation.
68
+ """).strip()
69
+
70
+
71
+ def format_observation(obs):
72
+ """Convert observation into a readable prompt for the LLM."""
73
+
74
+ # format inventory with batch detail, remaining capacity, and extra cost
75
+ inv_lines = []
76
+ for product, batches in obs.updated_inventory.items():
77
+ total = sum(b[0] for b in batches)
78
+ remaining = obs.remaining_capacity.get(product, 0)
79
+ extra_cost = EXTRA_INVENTORY_COST.get(product, 0)
80
+ batch_detail = ", ".join(
81
+ f"{b[0]} units" + (f" ({b[1]}d left)" if b[1] is not None else "")
82
+ for b in batches
83
+ )
84
+ inv_lines.append(f" {product}: {total} total [{batch_detail}] | space left: {remaining} (extra space: ${extra_cost}/unit)")
85
+ inv_text = "\n".join(inv_lines)
86
+
87
+ # format events
88
+ event_lines = []
89
+ for event, days in obs.updated_events.items():
90
+ if days > 0:
91
+ event_lines.append(f" {event}: in {days} days")
92
+ else:
93
+ event_lines.append(f" {event}: ACTIVE NOW")
94
+ events_text = "\n".join(event_lines) if event_lines else " None"
95
+
96
+ # format deliveries
97
+ delivery_lines = []
98
+ for delivery in obs.updated_deliveries:
99
+ for product, shipment in delivery.items():
100
+ qty, arrival_day = shipment
101
+ days_away = arrival_day - obs.current_day
102
+ delivery_lines.append(f" {product}: {qty} units arriving in {days_away} days")
103
+ deliveries_text = "\n".join(delivery_lines) if delivery_lines else " None"
104
+
105
+ # format demand (already happened today — feedback, not prediction)
106
+ demand_lines = []
107
+ for product, units in obs.demand_today.items():
108
+ demand_lines.append(f" {product}: {units} units")
109
+ demand_text = "\n".join(demand_lines) if demand_lines else " No demand data yet"
110
+
111
+ prompt = f"""Day: {obs.current_day}/{MAX_DAYS}
112
+ Cash: ${obs.total_cash:.2f}
113
+ Day Profit: ${obs.day_profit:.2f}
114
+ Total Profit: ${obs.total_profit:.2f}
115
+ Last Step Reward: {obs.reward:.3f}
116
+
117
+ Inventory:
118
+ {inv_text}
119
+
120
+ Demand That Occurred Today:
121
+ {demand_text}
122
+
123
+ Upcoming Events:
124
+ {events_text}
125
+
126
+ Pending Deliveries:
127
+ {deliveries_text}
128
+
129
+ Respond with your action as JSON."""
130
+
131
+ return prompt
132
+
133
+
134
+ def parse_action(response_text):
135
+ """Parse LLM response into InventoryAction."""
136
+ try:
137
+ text = response_text.strip()
138
+ if text.startswith("```"):
139
+ text = text.split("\n", 1)[1]
140
+ text = text.rsplit("```", 1)[0]
141
+
142
+ data = json.loads(text)
143
+ return InventoryAction(**data)
144
+ except Exception:
145
+ print(response_text)
146
+ return InventoryAction(
147
+ buy_quantities={},
148
+ delivery_method="slow",
149
+ liquidate={},
150
+ )
151
+
152
+
153
+ def run_task(client, task_name):
154
+ """Run a single task and return total profit."""
155
+ env = InventoryEnvironment(task_name)
156
+ obs = env.reset()
157
+
158
+ print(f"\n{'=' * 50}")
159
+ print(f"Task: {task_name.upper()} | Cash: ${obs.total_cash:.2f} | Days: {env.max_days}")
160
+ print(f"{'=' * 50}")
161
+
162
+ for day in range(1, env.max_days + 1):
163
+ if obs.done:
164
+ print("Episode ended early.")
165
+ break
166
+
167
+ user_prompt = format_observation(obs)
168
+
169
+ messages = [
170
+ {"role": "system", "content": SYSTEM_PROMPT},
171
+ {"role": "user", "content": user_prompt},
172
+ ]
173
+
174
+ try:
175
+ completion = client.chat.completions.create(
176
+ model=MODEL_NAME,
177
+ messages=messages,
178
+ # temperature=0.2,
179
+ max_completion_tokens=300,
180
+ stream=False,
181
+ )
182
+ response_text = completion.choices[0].message.content or ""
183
+ except Exception as exc:
184
+ print(f" LLM request failed: {exc}. Skipping turn.")
185
+ response_text = "{}"
186
+
187
+ action = parse_action(response_text)
188
+
189
+ print(f"Day {day}: buy={action.buy_quantities} delivery={action.delivery_method} liquidate={action.liquidate}")
190
+
191
+ obs = env.step(action)
192
+
193
+ print(f" Cash: ${obs.total_cash:.2f} | Day Profit: ${obs.day_profit:.2f} | Reward: {obs.reward:.3f}")
194
+
195
+ print(f"Task {task_name} complete | Total profit: ${obs.total_profit:.2f}")
196
+ return obs.total_profit
197
+
198
+
199
+ def main():
200
+ from server.grader import grade_all, compute_baselines
201
+
202
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
203
+
204
+ # print baselines first
205
+ print(f"\n{'=' * 50}")
206
+ print("BASELINES")
207
+ print(f"{'=' * 50}")
208
+ for task_name in ["easy", "medium", "hard"]:
209
+ floor, ceiling = compute_baselines(task_name)
210
+ print(f" {task_name}: floor=${floor:.2f} (passive) | ceiling=${ceiling:.2f} (heuristic)")
211
+
212
+ results = {}
213
+ for task_name in ["easy", "medium", "hard"]:
214
+ profit = run_task(client, task_name)
215
+ results[task_name] = profit
216
+
217
+ scores = grade_all(results)
218
+
219
+ print(f"\n{'=' * 50}")
220
+ print("FINAL SCORES")
221
+ print(f"{'=' * 50}")
222
+ for task_name, score in scores.items():
223
+ floor, ceiling = compute_baselines(task_name)
224
+ print(f" {task_name}: {score:.3f} (profit: ${results[task_name]:.2f} | floor: ${floor:.2f} | ceiling: ${ceiling:.2f})")
225
+
226
+
227
+ if __name__ == "__main__":
228
+ main()
models.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from openenv.core.env_server import Action, Observation, State
4
+ from typing import Literal, Dict, List, Optional
5
+
6
+
7
+ class InventoryAction(Action):
8
+ buy_quantities : Dict[str, int] = {}
9
+ delivery_method : Literal["slow", "medium", "fast"] = "slow"
10
+ liquidate : Dict[str, int] = {}
11
+
12
+
13
+ class InventoryObservation(Observation):
14
+ current_day : int
15
+ total_cash : float
16
+ day_profit : float
17
+ total_profit : float
18
+ demand_today : Dict[str, int] # product -> units demanded today
19
+ updated_inventory : Dict[str, List[List[Optional[int]]]] # product -> [[qty, days_left], ...] per batch
20
+ remaining_capacity : Dict[str, int] # product -> remaining warehouse space
21
+ updated_events : Dict[str, int]
22
+ updated_deliveries : List[Dict[str, List[int]]] # product name, (quantity of product, days to arrival)
23
+
24
+
25
+ class InventoryState(State):
26
+ episode_id : str
27
+ current_day : int
28
+ cash : float
29
+ inventory : Dict[str, int]
openenv.yaml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: inventory_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
pyproject.toml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "inventory-env"
3
+ version = "0.1.0"
4
+ description = "Retail Inventory Optimization RL Environment for OpenEnv"
5
+ requires-python = ">=3.10"
6
+ dependencies = [
7
+ "openenv-core[core]>=0.2.0",
8
+ "fastapi>=0.115.0",
9
+ "uvicorn>=0.24.0",
10
+ "pydantic>=2.0.0",
11
+ "numpy>=1.24.0",
12
+ "openai>=1.0.0",
13
+ "python-dotenv>=1.0.0",
14
+ ]
15
+
16
+ [build-system]
17
+ requires = ["setuptools"]
18
+ build-backend = "setuptools.backends._legacy:_Backend"
server/__init__.py ADDED
File without changes
server/app.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from openenv.core.env_server import create_app
2
+ from server.inventory_env import InventoryEnvironment
3
+ from models import InventoryAction, InventoryObservation
4
+
5
+ app = create_app(InventoryEnvironment, InventoryAction, InventoryObservation, env_name="inventory_env")
6
+
7
+
8
+ def main():
9
+ import uvicorn
10
+ uvicorn.run(app, host="0.0.0.0", port=8000)
11
+
12
+
13
+ if __name__ == "__main__":
14
+ main()
server/constants.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ INITIAL_CASH = 1000.0
2
+
3
+ # Product name -> base price (selling price before multiplier)
4
+ BASE_PRICES = {
5
+ "electronics": 150.0,
6
+ "clothing": 40.0,
7
+ "groceries": 10.0,
8
+ "furniture": 200.0,
9
+ "toys": 25.0,
10
+ }
11
+
12
+ # Product name -> cost price (what you pay to buy stock)
13
+ COST_PRICES = {
14
+ "electronics": 100.0,
15
+ "clothing": 25.0,
16
+ "groceries": 5.0,
17
+ "furniture": 130.0,
18
+ "toys": 12.0,
19
+ }
20
+
21
+ # Product name -> shelf life in days (None = no expiry)
22
+ SHELF_LIFE = {
23
+ "electronics": None,
24
+ "clothing": None,
25
+ "groceries": 5,
26
+ "furniture": None,
27
+ "toys": None,
28
+ }
29
+
30
+ # Product name -> starting stock quantity
31
+ INITIAL_STOCK = {
32
+ "electronics": 10,
33
+ "clothing": 20,
34
+ "groceries": 50,
35
+ "furniture": 5,
36
+ "toys": 30,
37
+ }
38
+
39
+ # Delivery method -> cost per unit
40
+ SHIPPING_COST = {
41
+ "slow": 2.0,
42
+ "medium": 5.0,
43
+ "fast": 10.0,
44
+ }
45
+
46
+ # Delivery method -> days to arrive
47
+ SHIPPING_DAYS = {
48
+ "slow": 5,
49
+ "medium": 3,
50
+ "fast": 1,
51
+ }
52
+
53
+ # Event name -> days until event (spread across 30 days)
54
+ EVENTS = {
55
+ "black_friday": 6,
56
+ "christmas": 12,
57
+ "back_to_school": 18,
58
+ "summer_clearance": 24,
59
+ "new_competitor": 28,
60
+ }
61
+
62
+ # Product name -> max inventory space (units)
63
+ INVENTORY_CAPACITY = {
64
+ "electronics": 100,
65
+ "clothing": 200,
66
+ "groceries": 500,
67
+ "furniture": 50,
68
+ "toys": 300,
69
+ }
70
+
71
+ # Product name -> additional cost per unit for extra inventory beyond capacity
72
+ EXTRA_INVENTORY_COST = {
73
+ "electronics": 20.0,
74
+ "clothing": 5.0,
75
+ "groceries": 2.0,
76
+ "furniture": 30.0,
77
+ "toys": 4.0,
78
+ }
79
+
80
+ # Product name -> (min_demand, max_demand) per day
81
+ BASE_DEMAND = {
82
+ "electronics": (3, 8),
83
+ "clothing": (5, 15),
84
+ "groceries": (20, 40),
85
+ "furniture": (1, 3),
86
+ "toys": (5, 12),
87
+ }
88
+
89
+ WEEKEND_MULTIPLIER = 1.2
90
+
91
+ # Event name -> {product: demand_multiplier} when event triggers
92
+ EVENT_EFFECTS = {
93
+ "black_friday": {"electronics": 3.0, "clothing": 2.5, "toys": 2.0, "furniture": 1.5, "groceries": 1.0},
94
+ "christmas": {"toys": 3.0, "electronics": 2.0, "clothing": 1.5, "furniture": 1.0, "groceries": 1.5},
95
+ "back_to_school": {"clothing": 2.5, "electronics": 1.5, "toys": 1.5, "furniture": 1.0, "groceries": 1.0},
96
+ "summer_clearance": {"clothing": 2.0, "toys": 1.5, "electronics": 1.0, "furniture": 1.5, "groceries": 1.0},
97
+ "new_competitor": {"electronics": 0.6, "clothing": 0.7, "toys": 0.7, "furniture": 0.8, "groceries": 0.9},
98
+ }
99
+
100
+ EVENT_DURATION = 2
101
+
102
+ MAX_DAYS = 30
103
+
104
+ UPGRADE_DELIVERY_COST = 50.0
105
+
106
+ # Task configs for easy/medium/hard
107
+ TASKS = {
108
+ # Easy: High starting stock, low demand, no events, full warehouse capacity.
109
+ # Agent just needs to maintain stock and sell. Minimal challenge.
110
+ "easy": {
111
+ "seed": 100,
112
+ "max_days": 30,
113
+ "initial_cash": 1000.0,
114
+ "events": {}, # no events
115
+ "initial_stock": {
116
+ "electronics": 5,
117
+ "clothing": 10,
118
+ "groceries": 20,
119
+ "furniture": 3,
120
+ "toys": 10,
121
+ },
122
+ "inventory_capacity": INVENTORY_CAPACITY,
123
+ "base_demand": {
124
+ "electronics": (2, 5),
125
+ "clothing": (3, 10),
126
+ "groceries": (15, 30),
127
+ "furniture": (1, 2),
128
+ "toys": (3, 8),
129
+ },
130
+ },
131
+ # Medium: Default stock/cash, all 5 events spread across 30 days, normal demand.
132
+ # Agent must anticipate demand spikes from events and restock accordingly.
133
+ "medium": {
134
+ "seed": 200,
135
+ "max_days": 30,
136
+ "initial_cash": 1000.0,
137
+ "events": EVENTS,
138
+ "initial_stock": INITIAL_STOCK,
139
+ "inventory_capacity": INVENTORY_CAPACITY,
140
+ "base_demand": BASE_DEMAND,
141
+ },
142
+ # Hard: Half starting cash ($500), low stock, events packed close together,
143
+ # higher demand, smaller warehouse. Agent must balance tight budget,
144
+ # overlapping event spikes, and fast-expiring groceries.
145
+ "hard": {
146
+ "seed": 300,
147
+ "max_days": 30,
148
+ "initial_cash": 500.0,
149
+ "events": {
150
+ "black_friday": 4,
151
+ "christmas": 8,
152
+ "back_to_school": 12,
153
+ "summer_clearance": 16,
154
+ "new_competitor": 20,
155
+ },
156
+ "initial_stock": {
157
+ "electronics": 5,
158
+ "clothing": 10,
159
+ "groceries": 30,
160
+ "furniture": 3,
161
+ "toys": 15,
162
+ },
163
+ "inventory_capacity": {
164
+ "electronics": 50,
165
+ "clothing": 100,
166
+ "groceries": 250,
167
+ "furniture": 25,
168
+ "toys": 150,
169
+ },
170
+ "base_demand": {
171
+ "electronics": (5, 12),
172
+ "clothing": (8, 20),
173
+ "groceries": (30, 60),
174
+ "furniture": (2, 5),
175
+ "toys": (8, 18),
176
+ },
177
+ },
178
+ }
server/grader.py ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Grader for inventory optimization tasks.
3
+ Scores agent performance on a 0.0-1.0 scale using floor/ceiling approach.
4
+ - floor: passive agent (no buys, just sells initial stock until empty)
5
+ - ceiling: heuristic agent (buys to meet average demand each day)
6
+ """
7
+
8
+ from server.inventory_env import InventoryEnvironment
9
+ from models import InventoryAction
10
+ from server.constants import TASKS, BASE_PRICES, COST_PRICES, SHIPPING_COST
11
+
12
+
13
+ def _run_passive(task_name):
14
+ """Floor baseline: do nothing, just sell whatever initial stock covers."""
15
+ env = InventoryEnvironment(task_name)
16
+ obs = env.reset()
17
+
18
+ while not obs.done:
19
+ action = InventoryAction(
20
+ buy_quantities={},
21
+ delivery_method="slow",
22
+ liquidate={},
23
+ )
24
+ obs = env.step(action)
25
+
26
+ return obs.total_profit
27
+
28
+
29
+ def _run_heuristic(task_name):
30
+ """Ceiling baseline: smart heuristic that stocks up before events."""
31
+ task = TASKS[task_name]
32
+ env = InventoryEnvironment(task_name)
33
+ obs = env.reset()
34
+
35
+ while not obs.done:
36
+ buy = {}
37
+ delivery = "medium"
38
+ liquidate = {}
39
+
40
+ # check if any event is imminent (within 3 days)
41
+ event_soon = False
42
+ for event, days in obs.updated_events.items():
43
+ if 0 < days <= 3:
44
+ event_soon = True
45
+ break
46
+
47
+ for product, (lo, hi) in task["base_demand"].items():
48
+ avg_demand = (lo + hi) // 2
49
+ current = sum(b[0] for b in obs.updated_inventory.get(product, []))
50
+
51
+ if event_soon:
52
+ # stock up 5 days' worth before events, use fast shipping
53
+ target = avg_demand * 5
54
+ delivery = "fast"
55
+ else:
56
+ # normal: keep 3 days' buffer
57
+ target = avg_demand * 3
58
+
59
+ if current < target:
60
+ buy[product] = target - current
61
+
62
+ # liquidate groceries about to expire (1 day left)
63
+ for batch in obs.updated_inventory.get("groceries", []):
64
+ if batch[1] is not None and batch[1] <= 1:
65
+ liquidate["groceries"] = liquidate.get("groceries", 0) + batch[0]
66
+
67
+ # don't buy on last 2 days
68
+ if obs.current_day >= task["max_days"] - 2:
69
+ buy = {}
70
+
71
+ # don't buy more than cash allows (rough check)
72
+ total_cost = sum(qty * (COST_PRICES[p] + SHIPPING_COST[delivery]) for p, qty in buy.items())
73
+ if total_cost > obs.total_cash * 0.8:
74
+ # scale down proportionally
75
+ scale = (obs.total_cash * 0.8) / total_cost if total_cost > 0 else 0
76
+ buy = {p: max(1, int(qty * scale)) for p, qty in buy.items()}
77
+
78
+ action = InventoryAction(
79
+ buy_quantities=buy,
80
+ delivery_method=delivery,
81
+ liquidate=liquidate,
82
+ )
83
+ obs = env.step(action)
84
+
85
+ return obs.total_profit
86
+
87
+
88
+ def compute_baselines(task_name):
89
+ """Pre-compute floor and ceiling for a task."""
90
+ floor = _run_passive(task_name)
91
+ ceiling = _run_heuristic(task_name)
92
+ return floor, ceiling
93
+
94
+
95
+ def grade(task_name, agent_profit):
96
+ """
97
+ Grade agent performance on 0.0-1.0 scale.
98
+
99
+ Args:
100
+ task_name: "easy", "medium", or "hard"
101
+ agent_profit: total profit achieved by the agent
102
+
103
+ Returns:
104
+ float score between 0.0 and 1.0
105
+ """
106
+ floor, ceiling = compute_baselines(task_name)
107
+
108
+ if ceiling <= floor:
109
+ return 1.0 if agent_profit >= ceiling else 0.0
110
+
111
+ score = (agent_profit - floor) / (ceiling - floor)
112
+ return max(0.0, min(1.0, score))
113
+
114
+
115
+ def grade_all(results):
116
+ """
117
+ Grade all 3 tasks.
118
+
119
+ Args:
120
+ results: dict of {task_name: agent_profit}
121
+
122
+ Returns:
123
+ dict of {task_name: score}
124
+ """
125
+ scores = {}
126
+ for task_name, agent_profit in results.items():
127
+ scores[task_name] = grade(task_name, agent_profit)
128
+ return scores
129
+
130
+
131
+ if __name__ == "__main__":
132
+ print("Computing baselines for all tasks...")
133
+ for task_name in ["easy", "medium", "hard"]:
134
+ floor, ceiling = compute_baselines(task_name)
135
+ print(f" {task_name}: floor={floor:.2f}, ceiling={ceiling:.2f}")
server/inventory_env.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from openenv.core.env_server.interfaces import Environment
2
+ import copy
3
+ import random
4
+ from uuid import uuid4
5
+
6
+ from models import InventoryAction, InventoryObservation, InventoryState
7
+ from .constants import (
8
+ INITIAL_CASH, BASE_PRICES, COST_PRICES, SHELF_LIFE, INITIAL_STOCK,
9
+ EVENTS, SHIPPING_COST, SHIPPING_DAYS, INVENTORY_CAPACITY,
10
+ EXTRA_INVENTORY_COST, BASE_DEMAND, WEEKEND_MULTIPLIER, EVENT_EFFECTS,
11
+ EVENT_DURATION, MAX_DAYS, UPGRADE_DELIVERY_COST, TASKS,
12
+ )
13
+
14
+
15
+ def _build_inventory(stock):
16
+ """Convert stock dict to batch format: {product: [[qty, days_left], ...]}"""
17
+ inv = {}
18
+ for product, qty in stock.items():
19
+ shelf = SHELF_LIFE[product]
20
+ inv[product] = [[qty, shelf]]
21
+ return inv
22
+
23
+
24
+ class InventoryEnvironment(Environment):
25
+
26
+ def __init__(self, task_name="medium"):
27
+ self.task_name = task_name
28
+ self.task = TASKS[task_name]
29
+ self.cash = self.task["initial_cash"]
30
+ self.inventory = _build_inventory(self.task["initial_stock"])
31
+ self.events = copy.deepcopy(self.task["events"])
32
+ self.deliveries = []
33
+ self.current_day = 0
34
+ self.total_profit = 0.0
35
+ self.seed = self.task["seed"]
36
+ self.reward = 0.0
37
+ self.max_days = self.task["max_days"]
38
+ self.inventory_capacity = self.task["inventory_capacity"]
39
+ self.base_demand = self.task["base_demand"]
40
+ self.reset()
41
+
42
+ def reset(self, seed: int = None) -> InventoryObservation:
43
+ if seed is not None:
44
+ self.seed = seed
45
+ else:
46
+ self.seed = self.task["seed"]
47
+ self.cash = self.task["initial_cash"]
48
+ self.inventory = _build_inventory(self.task["initial_stock"])
49
+ self.events = copy.deepcopy(self.task["events"])
50
+ self.deliveries = []
51
+ self.current_day = 0
52
+ self.total_profit = 0.0
53
+ self.reward = 0.0
54
+
55
+ self._state = InventoryState(
56
+ episode_id = str(uuid4()),
57
+ current_day = 0,
58
+ cash = self.task["initial_cash"],
59
+ inventory = dict(self.task["initial_stock"])
60
+ )
61
+
62
+ return InventoryObservation(
63
+ current_day = 0,
64
+ total_cash = self.cash,
65
+ day_profit = 0.0,
66
+ total_profit = 0.0,
67
+ demand_today = {},
68
+ updated_inventory = copy.deepcopy(self.inventory),
69
+ remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
70
+ updated_events = copy.deepcopy(self.events),
71
+ updated_deliveries = [],
72
+ reward = 0.0,
73
+ done = False,
74
+ )
75
+
76
+ def step(self, action: InventoryAction) -> InventoryObservation:
77
+ self.current_day += 1
78
+ self.reward = 0.0 # reset reward each step
79
+ day_cost = 0.0
80
+ day_revenue = 0.0
81
+
82
+ # 1. tick event countdowns
83
+ for event_name in self.events:
84
+ if self.events[event_name] > 0:
85
+ self.events[event_name] -= 1
86
+
87
+ # 2. remove expired groceries
88
+ new_batches = []
89
+ expired_groceries_count = 0
90
+ for batch in self.inventory["groceries"]:
91
+ if batch[1] == 0:
92
+ expired_groceries_count += batch[0]
93
+ continue
94
+
95
+ else:
96
+ new_batches.append([batch[0], batch[1] - 1])
97
+
98
+ self.inventory["groceries"] = new_batches
99
+
100
+ self.reward -= 0.05 * expired_groceries_count
101
+
102
+ # 3. Handle incoming deliveries
103
+ remaining_deliveries = []
104
+ for delivery in self.deliveries:
105
+ for product, shipment in delivery.items():
106
+ qty, arrival_day = shipment
107
+ if arrival_day <= self.current_day:
108
+ self.inventory[product].append([qty, SHELF_LIFE[product]])
109
+ else:
110
+ remaining_deliveries.append(delivery)
111
+ self.deliveries = remaining_deliveries
112
+
113
+ # 4. process purchases
114
+ for product, qty in action.buy_quantities.items():
115
+ unit_cost = COST_PRICES[product] + SHIPPING_COST[action.delivery_method]
116
+ total_cost = qty * unit_cost
117
+
118
+ # capacity overage cost
119
+ current_qty = sum(b[0] for b in self.inventory[product])
120
+ overage = max(0, (current_qty + qty) - self.inventory_capacity[product])
121
+ extra_cost = overage * EXTRA_INVENTORY_COST[product]
122
+ total_cost += extra_cost
123
+
124
+ if total_cost > self.cash:
125
+ self.reward -= 0.5 # penalize for ordering what you can't afford
126
+ continue
127
+
128
+ self.cash -= total_cost
129
+ day_cost += total_cost
130
+
131
+ arrival_day = self.current_day + SHIPPING_DAYS[action.delivery_method]
132
+ self.deliveries.append({product: [qty, arrival_day]})
133
+
134
+ # 5. generate demand
135
+ demand = self._generate_demand()
136
+
137
+ # 6. sell products (fifo)
138
+ for product, demand_today in demand.items():
139
+
140
+ product_availability = sum(batch[0] for batch in self.inventory[product])
141
+
142
+
143
+ if demand_today > product_availability:
144
+ missed_sales = demand_today - product_availability
145
+ sold = product_availability
146
+ day_revenue += sold * BASE_PRICES[product]
147
+ self.inventory[product] = []
148
+ self.reward -= missed_sales * BASE_PRICES[product] * 0.001
149
+ self.reward += sold * BASE_PRICES[product] * 0.001
150
+
151
+ else:
152
+ day_revenue += demand_today * BASE_PRICES[product]
153
+ self.reward += demand_today * BASE_PRICES[product] * 0.001
154
+
155
+ new_batches = []
156
+
157
+ for batch in self.inventory[product]:
158
+ if batch[0] < demand_today:
159
+ demand_today = demand_today - batch[0]
160
+
161
+
162
+ elif demand_today == 0:
163
+ new_batches.append(batch)
164
+
165
+ else:
166
+ new_batches.append([batch[0] - demand_today, batch[1]])
167
+ demand_today = 0
168
+
169
+ self.inventory[product] = new_batches
170
+
171
+ # 7. Liquidate some stock (FIFO, no revenue)
172
+ total_liquidation_loss = 0.0
173
+ for product, count in action.liquidate.items():
174
+ if product not in self.inventory or count <= 0:
175
+ continue
176
+ actually_removed = min(count, sum(b[0] for b in self.inventory[product]))
177
+ total_liquidation_loss += actually_removed * COST_PRICES[product]
178
+ remaining = count
179
+ new_batches = []
180
+ for batch in self.inventory[product]:
181
+ if remaining <= 0:
182
+ new_batches.append(batch)
183
+ elif batch[0] <= remaining:
184
+ remaining -= batch[0]
185
+ else:
186
+ new_batches.append([batch[0] - remaining, batch[1]])
187
+ remaining = 0
188
+ self.inventory[product] = new_batches
189
+
190
+ self.reward -= total_liquidation_loss * 0.001
191
+
192
+ # compute day profit
193
+ day_profit = day_revenue - day_cost
194
+ self.cash += day_revenue
195
+ self.total_profit += day_profit
196
+
197
+ # check done
198
+ done = self.current_day >= self.max_days
199
+
200
+ # update state
201
+ self._state = InventoryState(
202
+ episode_id = self._state.episode_id,
203
+ current_day = self.current_day,
204
+ cash = self.cash,
205
+ inventory = {p: sum(b[0] for b in self.inventory[p]) for p in self.inventory},
206
+ )
207
+
208
+ return InventoryObservation(
209
+ current_day = self.current_day,
210
+ total_cash = self.cash,
211
+ day_profit = day_profit,
212
+ total_profit = self.total_profit,
213
+ demand_today = demand,
214
+ updated_inventory = copy.deepcopy(self.inventory),
215
+ remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
216
+ updated_events = copy.deepcopy(self.events),
217
+ updated_deliveries = copy.deepcopy(self.deliveries),
218
+ reward = self.reward,
219
+ done = done,
220
+ )
221
+
222
+
223
+ def _generate_demand(self):
224
+ rng = random.Random(self.seed * 1000 + self.current_day)
225
+ demand = {}
226
+
227
+ for product, (lo, hi) in self.base_demand.items():
228
+ demand[product] = rng.randint(lo, hi)
229
+
230
+ # weekend boost
231
+ if self.current_day % 7 in (5, 6):
232
+ for product in demand:
233
+ demand[product] = int(demand[product] * WEEKEND_MULTIPLIER)
234
+
235
+ # active event multipliers
236
+ for event_name, days in self.events.items():
237
+ if days <= 0 and event_name in EVENT_EFFECTS:
238
+ for product, mult in EVENT_EFFECTS[event_name].items():
239
+ demand[product] = int(demand[product] * mult)
240
+
241
+ return demand
242
+
243
+
244
+ @property
245
+ def state(self) -> InventoryState:
246
+ return self._state