Spaces:

saketh1201
/

inventory_env

Sleeping

App Files Files Community

saketh1201 commited on Apr 3

Commit

3764c76

verified ·

1 Parent(s): d97ff83

Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

Dockerfile +2 -1
README.md +103 -38
client.py +4 -0
inference.py +32 -28
models.py +4 -2
server/constants.py +8 -0
server/grader.py +40 -89
server/inventory_env.py +26 -7

Dockerfile CHANGED Viewed

@@ -1,5 +1,7 @@
 FROM ghcr.io/meta-pytorch/openenv-base:latest AS builder
 RUN apt-get update && apt-get install -y git curl && \
     curl -LsSf https://astral.sh/uv/install.sh | sh
 ENV PATH="/root/.local/bin:$PATH"
@@ -25,5 +27,4 @@ EXPOSE 8000
 HEALTHCHECK --interval=30s --timeout=3s \
     CMD curl -f http://localhost:8000/health || exit 1
-ENV ENABLE_WEB_INTERFACE=true
 CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

 FROM ghcr.io/meta-pytorch/openenv-base:latest AS builder
+ENV ENABLE_WEB_INTERFACE=true
 RUN apt-get update && apt-get install -y git curl && \
     curl -LsSf https://astral.sh/uv/install.sh | sh
 ENV PATH="/root/.local/bin:$PATH"
 HEALTHCHECK --interval=30s --timeout=3s \
     CMD curl -f http://localhost:8000/health || exit 1
 CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -12,7 +12,11 @@ base_path: /web
 # Retail Inventory Optimization Environment
-An OpenEnv reinforcement learning environment that simulates day-by-day retail inventory management across 5 product categories. An AI agent must decide what to buy, how to ship, and what to liquidate to maximize profit over a 30-day episode.
 ## Environment Description
@@ -26,22 +30,73 @@ You manage a retail store selling 5 products with different characteristics:
 | Furniture | $200 | $130 | $70 | No expiry |
 | Toys | $25 | $12 | $13 | No expiry |
-Each day, customer demand is generated (with weekend boosts and event spikes). The agent must keep stock levels high enough to meet demand while managing cash flow, shipping delays, warehouse capacity, and perishable goods.
 ## Action Space
 ```python
 class InventoryAction(Action):
-    buy_quantities: Dict[str, int] = {}        # product -> quantity to order
     delivery_method: Literal["slow", "medium", "fast"] = "slow"
-    liquidate: Dict[str, int] = {}             # product -> quantity to dispose
 ```
 | Field | Description |
 |-------|-------------|
 | `buy_quantities` | Products and amounts to order. Empty `{}` to skip buying. |
-| `delivery_method` | `"slow"` ($2/unit, 5 days), `"medium"` ($5/unit, 3 days), `"fast"` ($10/unit, 1 day) |
 | `liquidate` | Products and amounts to dispose of (no revenue). Use for expiring groceries or freeing warehouse space. |
 ## Observation Space
@@ -51,56 +106,48 @@ class InventoryObservation(Observation):
     total_cash: float
     day_profit: float
     total_profit: float
-    demand_today: Dict[str, int]
-    updated_inventory: Dict[str, List[List[Optional[int]]]]  # [[qty, days_left], ...]
-    remaining_capacity: Dict[str, int]
-    updated_events: Dict[str, int]
-    updated_deliveries: List[Dict[str, List[int]]]
 ```
-The inventory uses a batch format with FIFO selling: `{"groceries": [[20, 3], [10, 5]]}` means 20 units expiring in 3 days and 10 units expiring in 5 days.
 ## Tasks (Easy / Medium / Hard)
 ### Easy — "Steady State"
 - Low starting stock, low steady demand, no events
 - Starting cash: $1,000 | Full warehouse capacity
 - Agent needs to restock regularly but demand is predictable
 ### Medium — "Seasonal Rush"
 - Default stock/cash, all 5 events spread across 30 days
 - Events: Black Friday (day 6), Christmas (day 12), Back to School (day 18), Summer Clearance (day 24), New Competitor (day 28)
-- Agent must anticipate demand spikes and restock accordingly
 ### Hard — "Chaos Mode"
-- Half starting cash ($500), low stock, events packed close together
-- Higher demand, smaller warehouse capacity
-- Agent must balance tight budget, overlapping event spikes, and fast-expiring groceries
-## Reward Function
-Per-step reward based on multiple signals:
-- **Successful sales**: `+sold_units * sell_price * 0.001` (proportional to revenue)
-- **Missed sales**: `-missed_units * sell_price * 0.001` (proportional to lost revenue)
-- **Expired groceries**: `-0.05 * expired_count`
-- **Failed purchases**: `-0.5` per order that exceeds available cash
-- **Liquidation loss**: `-liquidated_value * 0.001` (proportional to cost of disposed stock)
 ## Grading (0.0 - 1.0)
-Each task is scored by comparing agent profit against two baselines:
-- **Floor**: Passive agent that never buys (sells initial stock until empty)
-- **Ceiling**: Smart heuristic that restocks based on demand and events
 ```
 score = clamp((agent_profit - floor) / (ceiling - floor), 0.0, 1.0)
 ```
 ## Setup
 ```bash
 # Install dependencies
-pip install openenv-core[core] fastapi uvicorn pydantic openai numpy
 # Run grader baselines
 python -c "from server.grader import compute_baselines; [print(f'{t}: floor={f:.2f}, ceiling={c:.2f}') for t in ['easy','medium','hard'] for f,c in [compute_baselines(t)]]"
@@ -116,10 +163,17 @@ curl -X POST http://localhost:8000/reset
 ## Running Inference
 ```bash
 export API_BASE_URL="https://router.huggingface.co/v1"
-export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
 export HF_TOKEN="your-token"
 python inference.py
 ```
 ## Docker
@@ -129,21 +183,32 @@ docker build -t inventory-env .
 docker run -p 8000:8000 inventory-env
 ```
 ## Project Structure
 ```
-V2/
-├── models.py              # InventoryAction, InventoryObservation, InventoryState
 ├── client.py              # EnvClient for remote WebSocket connections
-├── inference.py           # LLM inference script (runs all 3 tasks)
 ├── openenv.yaml           # OpenEnv spec manifest
 ├── pyproject.toml         # Python dependencies
-├── Dockerfile             # Container build
 ├── server/
-│   ├── app.py             # FastAPI server (create_app)
-│   ├── inventory_env.py   # Environment (reset, step, state)
-│   ├── constants.py       # Prices, stock, events, task configs
-│   └── grader.py          # Floor/ceiling baselines and scoring
 └── scripts/
     └── validate-submission.sh  # Pre-submission validator
 ```

 # Retail Inventory Optimization Environment
+An OpenEnv reinforcement learning environment that simulates day-by-day retail inventory management across 5 product categories. An AI agent must balance purchasing, pricing, shipping, and liquidation decisions to maximize profit over a 30-day episode.
+## Why Inventory Management?
+Retail inventory optimization is a real-world task performed daily by store managers, warehouse operators, and supply chain planners. The agent faces the same challenges as a human manager: uncertain demand, perishable goods, shipping delays, seasonal events, and limited cash flow. Poor decisions lead to stockouts (lost sales), waste (expired goods), or cash tied up in unsold inventory.
 ## Environment Description
 | Furniture | $200 | $130 | $70 | No expiry |
 | Toys | $25 | $12 | $13 | No expiry |
+Each day the agent receives the current store state (cash, inventory with batch expiry, pending deliveries, upcoming events) and must decide:
+- **What to buy** and how much of each product
+- **How to ship** — slow (cheap but unreliable), medium, or fast (expensive but guaranteed)
+- **What to liquidate** — dispose of expiring or excess stock
+- **How to price** — set per-product price multipliers that affect demand via elasticity
+Customer demand is generated each day based on base ranges, weekend boosts (1.2x on days 5-6), and seasonal event multipliers (up to 3x during Black Friday, Christmas, etc.). The agent cannot see future demand — only yesterday's demand as feedback.
+The episode runs for 30 days. The goal is to maximize total profit.
+## Environment Design Highlights
+### Batch-Tracked Inventory with FIFO
+Inventory is tracked per batch with individual expiry dates. Groceries expire after 5 days. Selling and liquidation follow FIFO (First In, First Out) — oldest batches are consumed first, mimicking real warehouse operations.
+```json
+{"groceries": [[20, 3], [15, 5], [10, 1]]}
+```
+Three batches: 20 units (3 days left), 15 units (5 days left), 10 units (1 day left — liquidate or lose them).
+### Dynamic Pricing with Price Elasticity
+The agent can set per-product price multipliers (0.5x to 1.5x) each day. Demand responds to pricing via realistic elasticity values — groceries are inelastic (people buy regardless), while clothing and toys are highly elastic (price-sensitive customers).
+| Product | Elasticity | Effect of 1.3x price |
+|---------|-----------|----------------------|
+| Electronics | 1.2 | Demand drops ~24% |
+| Clothing | 1.5 | Demand drops ~38% |
+| Groceries | 0.4 | Demand drops only ~11% |
+| Furniture | 0.8 | Demand drops ~22% |
+| Toys | 1.3 | Demand drops ~33% |
+### Delivery Jitter
+Shipping isn't perfectly reliable. Slow delivery has +/-2 day variance, medium has +/-1 day. Only fast delivery (at 5x the cost) is guaranteed next-day. The agent must account for uncertainty when planning restocks before events.
+### Seasonal Events with Demand Spikes
+Five events are spread across the 30-day episode. Each event triggers a 2-day demand multiplier — Black Friday triples electronics demand, Christmas triples toys, etc. A "new competitor" event actually reduces demand. The agent sees countdowns and must stock up in advance.
+### Decomposed Per-Step Reward
+The reward function provides granular feedback every step, not just end-of-episode:
+| Signal | Formula | Purpose |
+|--------|---------|---------|
+| Successful sales | `+sold * sell_price * 0.001` | Reward revenue proportional to product value |
+| Missed sales | `-missed * sell_price * 0.001` | Penalize stockouts, weighted by product value |
+| Expired groceries | `-0.05 * expired_count` | Penalize waste from overbuying perishables |
+| Failed purchases | `-0.5 per rejected order` | Penalize ordering beyond cash budget |
+| Liquidation loss | `-disposed_value * 0.001` | Penalize disposal proportional to cost |
+### Conversation History for LLM Agents
+The inference script maintains a rolling 7-day conversation history. The LLM sees its past observations and decisions, enabling it to spot demand trends, learn from mistakes, and adjust strategy across the episode.
 ## Action Space
 ```python
 class InventoryAction(Action):
+    buy_quantities: Dict[str, int] = {}
     delivery_method: Literal["slow", "medium", "fast"] = "slow"
+    liquidate: Dict[str, int] = {}
+    price_multipliers: Dict[str, float] = {}
 ```
 | Field | Description |
 |-------|-------------|
 | `buy_quantities` | Products and amounts to order. Empty `{}` to skip buying. |
+| `delivery_method` | `"slow"` ($2/unit, 3-7 days), `"medium"` ($5/unit, 2-4 days), `"fast"` ($10/unit, 1 day guaranteed) |
 | `liquidate` | Products and amounts to dispose of (no revenue). Use for expiring groceries or freeing warehouse space. |
+| `price_multipliers` | Per-product selling price multiplier (0.5-1.5). Affects demand via elasticity. Default 1.0 if omitted. |
 ## Observation Space
     total_cash: float
     day_profit: float
     total_profit: float
+    demand_today: Dict[str, int]           # yesterday's demand (feedback)
+    updated_inventory: Dict[str, List]     # [[qty, days_left], ...] per batch
+    remaining_capacity: Dict[str, int]     # warehouse space left per product
+    updated_events: Dict[str, int]         # event countdowns (negative = active/ended)
+    updated_deliveries: List[Dict]         # in-transit shipments
 ```
 ## Tasks (Easy / Medium / Hard)
 ### Easy — "Steady State"
 - Low starting stock, low steady demand, no events
 - Starting cash: $1,000 | Full warehouse capacity
 - Agent needs to restock regularly but demand is predictable
+- No events, no demand spikes — pure supply chain management
 ### Medium — "Seasonal Rush"
 - Default stock/cash, all 5 events spread across 30 days
 - Events: Black Friday (day 6), Christmas (day 12), Back to School (day 18), Summer Clearance (day 24), New Competitor (day 28)
+- Agent must anticipate demand spikes and restock before events hit
 ### Hard — "Chaos Mode"
+- Half starting cash ($500), low stock, events packed close together (days 4, 8, 12, 16, 20)
+- Higher base demand, smaller warehouse capacity
+- Agent must balance tight budget, overlapping event spikes, perishable goods, and limited storage
 ## Grading (0.0 - 1.0)
+Each task is scored by comparing agent profit against two deterministic baselines:
+- **Floor**: Passive agent that never buys (sells initial stock until depleted)
+- **Ceiling**: Theoretical max profit assuming perfect demand knowledge and cheapest shipping
 ```
 score = clamp((agent_profit - floor) / (ceiling - floor), 0.0, 1.0)
 ```
+Both baselines are deterministic (seeded RNG) and computed fresh each run to ensure reproducibility.
 ## Setup
 ```bash
 # Install dependencies
+pip install openenv-core[core] fastapi uvicorn pydantic openai numpy python-dotenv
 # Run grader baselines
 python -c "from server.grader import compute_baselines; [print(f'{t}: floor={f:.2f}, ceiling={c:.2f}') for t in ['easy','medium','hard'] for f,c in [compute_baselines(t)]]"
 ## Running Inference
 ```bash
+# Using HuggingFace Router
 export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="Qwen/Qwen3-32B"
 export HF_TOKEN="your-token"
 python inference.py
+# Using OpenAI
+export API_BASE_URL="https://api.openai.com/v1"
+export MODEL_NAME="gpt-4o"
+export API_KEY="sk-your-key"
+python inference.py
 ```
 ## Docker
 docker run -p 8000:8000 inventory-env
 ```
+## Step Execution Order
+Each `step()` call processes in this order:
+1. Tick event countdowns (into negatives to track active duration)
+2. Remove expired groceries (shelf life = 0)
+3. Receive arriving deliveries (add to inventory with fresh shelf life)
+4. Process purchase orders (deduct cash, schedule deliveries with jitter)
+5. Generate demand (base + weekend boost + event multipliers + price elasticity)
+6. Sell products FIFO (oldest batches first, track missed sales)
+7. Liquidate requested stock FIFO (no revenue)
+8. Compute profit, reward, update state, return observation
 ## Project Structure
 ```
+├── models.py              # InventoryAction, InventoryObservation, InventoryState (Pydantic)
 ├── client.py              # EnvClient for remote WebSocket connections
+├── inference.py           # LLM inference script with conversation history (runs all 3 tasks)
 ├── openenv.yaml           # OpenEnv spec manifest
 ├── pyproject.toml         # Python dependencies
+├── Dockerfile             # Multi-stage container build from openenv-base
 ├── server/
+│   ├── app.py             # FastAPI server (create_app + uvicorn entry point)
+│   ├── inventory_env.py   # Environment (reset, step, state, demand generation)
+│   ├── constants.py       # All configs: prices, stock, events, tasks, elasticity
+│   └── grader.py          # Floor/ceiling baselines and 0.0-1.0 scoring
 └── scripts/
     └── validate-submission.sh  # Pre-submission validator
 ```

client.py CHANGED Viewed

@@ -25,6 +25,9 @@ class InventoryEnv(EnvClient[InventoryAction, InventoryObservation, InventorySta
         if action.liquidate is not None:
             payload["liquidate"] = action.liquidate
         return payload
@@ -40,6 +43,7 @@ class InventoryEnv(EnvClient[InventoryAction, InventoryObservation, InventorySta
             total_profit = obs_data.get("total_profit", 0),
             demand_today = obs_data.get("demand_today", {}),
             updated_inventory = obs_data.get("updated_inventory", {}),
             updated_events = obs_data.get("updated_events", {}),
             updated_deliveries = obs_data.get("updated_deliveries", []),
             done = obs_data.get("done", False),

         if action.liquidate is not None:
             payload["liquidate"] = action.liquidate
+        if action.price_multipliers is not None:
+            payload["price_multipliers"] = action.price_multipliers
         return payload
             total_profit = obs_data.get("total_profit", 0),
             demand_today = obs_data.get("demand_today", {}),
             updated_inventory = obs_data.get("updated_inventory", {}),
+            remaining_capacity = obs_data.get("remaining_capacity", {}),
             updated_events = obs_data.get("updated_events", {}),
             updated_deliveries = obs_data.get("updated_deliveries", []),
             done = obs_data.get("done", False),

inference.py CHANGED Viewed

@@ -25,8 +25,9 @@ from server.constants import EXTRA_INVENTORY_COST, EVENT_DURATION
 from models import InventoryAction
 API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
-API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
-MODEL_NAME = os.getenv("MODEL_NAME")
 MAX_DAYS = 30
 SYSTEM_PROMPT = textwrap.dedent("""
@@ -42,7 +43,7 @@ SYSTEM_PROMPT = textwrap.dedent("""
     Product selling prices: electronics=$150, clothing=$40, groceries=$10, furniture=$200, toys=$25
     Product cost prices: electronics=$100, clothing=$25, groceries=$5, furniture=$130, toys=$12
     Profit margins: electronics=$50, clothing=$15, groceries=$5, furniture=$70, toys=$13
-    Shipping costs per unit: slow=$2 (5 days), medium=$5 (3 days), fast=$10 (1 day)
     Warehouse capacity: electronics=100, clothing=200, groceries=500, furniture=50, toys=300
     Events (like black_friday, christmas) boost demand when their countdown hits 0 and last for 2 days.
@@ -50,25 +51,33 @@ SYSTEM_PROMPT = textwrap.dedent("""
     CRITICAL STRATEGY:
     - Review your history: if reward was negative, identify why and change approach.
-    - Track demand trends across days — if a product's demand is rising, stock up early.
     - You MUST restock products when inventory is low. Missed sales = lost revenue = negative reward.
     - Do NOT overbuy when demand is low — unsold stock ties up cash and perishables expire.
-    - Prioritize high-margin products: furniture ($70 profit), electronics ($50 profit).
-    - Stock up BEFORE events hit (check event countdowns — order 3-5 days ahead using slow/medium shipping).
     - When no events are approaching, slow shipping is often sufficient and saves significant cost.
     - Near end of episode (last 2 days), stop buying — focus on selling remaining stock.
     Each day you must respond with a JSON action:
     {
         "buy_quantities": {"product_name": quantity, ...},
         "delivery_method": "slow" | "medium" | "fast",
-        "liquidate": {"product_name": quantity, ...}
     }
     - buy_quantities: products and amounts to order.
     - delivery_method: shipping speed for this order
     - liquidate: products and amounts to dispose of (no revenue, empty {} to skip)
       Use liquidate to free up warehouse space before a restock.
     LEARNING FROM HISTORY:
     - Compare your past buy quantities to the demand that followed — were you over or under?
@@ -182,6 +191,8 @@ def parse_action(response_text):
             clean["delivery_method"] = data["delivery_method"]
         if "liquidate" in data:
             clean["liquidate"] = data["liquidate"]
         return InventoryAction(**clean)
     except Exception as e:
@@ -191,10 +202,11 @@ def parse_action(response_text):
             buy_quantities={},
             delivery_method="slow",
             liquidate={},
-        )
-HISTORY_WINDOW = 15  # rolling window of past days to include in context
 def run_task(client, task_name):
@@ -242,7 +254,7 @@ def run_task(client, task_name):
                 model=MODEL_NAME,
                 messages=messages,
                 temperature=0.0,
-                max_completion_tokens=300,
                 stream=False,
             )
             response_text = completion.choices[0].message.content or ""
@@ -255,7 +267,7 @@ def run_task(client, task_name):
         action = parse_action(response_text)
-        print(f"Day {day}: buy={action.buy_quantities} delivery={action.delivery_method} liquidate={action.liquidate}")
         obs = env.step(action)
@@ -266,35 +278,27 @@ def run_task(client, task_name):
 def main():
-    from server.grader import grade_all, compute_baselines
     if not MODEL_NAME:
         raise RuntimeError("MODEL_NAME is not set. Please export MODEL_NAME before running inference.")
     client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
-    # print baselines first
     print(f"\n{'=' * 50}")
-    print("BASELINES")
     print(f"{'=' * 50}")
-    for task_name in ["easy", "medium", "hard"]:
-        floor, ceiling = compute_baselines(task_name)
-        print(f"  {task_name}: floor=${floor:.2f} (passive) | ceiling=${ceiling:.2f} (heuristic)")
-    results = {}
-    for task_name in ["easy", "medium", "hard"]:
-        profit = run_task(client, task_name)
-        results[task_name] = profit
-    scores = grade_all(results)
     print(f"\n{'=' * 50}")
-    print("FINAL SCORES")
     print(f"{'=' * 50}")
-    for task_name, score in scores.items():
-        floor, ceiling = compute_baselines(task_name)
-        print(f"  {task_name}: {score:.3f} (profit: ${results[task_name]:.2f} | floor: ${floor:.2f} | ceiling: ${ceiling:.2f})")
 if __name__ == "__main__":
-    main()

 from models import InventoryAction
 API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY")
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen3-32B"
+TASK_NAME = os.getenv("TASK_NAME") or "easy"
 MAX_DAYS = 30
 SYSTEM_PROMPT = textwrap.dedent("""
     Product selling prices: electronics=$150, clothing=$40, groceries=$10, furniture=$200, toys=$25
     Product cost prices: electronics=$100, clothing=$25, groceries=$5, furniture=$130, toys=$12
     Profit margins: electronics=$50, clothing=$15, groceries=$5, furniture=$70, toys=$13
+    Shipping costs per unit: slow=$2 (3-7 days), medium=$5 (2-4 days), fast=$10 (1 day, always reliable)
     Warehouse capacity: electronics=100, clothing=200, groceries=500, furniture=50, toys=300
     Events (like black_friday, christmas) boost demand when their countdown hits 0 and last for 2 days.
     CRITICAL STRATEGY:
     - Review your history: if reward was negative, identify why and change approach.
+    - Track demand trends across days.
     - You MUST restock products when inventory is low. Missed sales = lost revenue = negative reward.
     - Do NOT overbuy when demand is low — unsold stock ties up cash and perishables expire.
+    - Stock up BEFORE events hit (check event countdowns — order 3-5 days ahead).
     - When no events are approaching, slow shipping is often sufficient and saves significant cost.
     - Near end of episode (last 2 days), stop buying — focus on selling remaining stock.
+    DYNAMIC PRICING:
+    You can set a price multiplier (0.5 to 1.5) per product each day. Default is 1.0.
+    - Lower price (e.g. 0.7) = more demand but less revenue per unit. Good for clearing excess stock.
+    - Higher price (e.g. 1.3) = less demand but more revenue per unit. Good when stock is low.
+    - Price elasticity varies across different products.
+    - Elasticity values: electronics=1.2, clothing=1.5, groceries=0.4, furniture=0.8, toys=1.3
     Each day you must respond with a JSON action:
     {
         "buy_quantities": {"product_name": quantity, ...},
         "delivery_method": "slow" | "medium" | "fast",
+        "liquidate": {"product_name": quantity, ...},
+        "price_multipliers": {"product_name": multiplier, ...}
     }
     - buy_quantities: products and amounts to order.
     - delivery_method: shipping speed for this order
     - liquidate: products and amounts to dispose of (no revenue, empty {} to skip)
       Use liquidate to free up warehouse space before a restock.
+    - price_multipliers: set selling price multiplier per product (0.5-1.5, default 1.0 if omitted)
     LEARNING FROM HISTORY:
     - Compare your past buy quantities to the demand that followed — were you over or under?
             clean["delivery_method"] = data["delivery_method"]
         if "liquidate" in data:
             clean["liquidate"] = data["liquidate"]
+        if "price_multipliers" in data:
+            clean["price_multipliers"] = data["price_multipliers"]
         return InventoryAction(**clean)
     except Exception as e:
             buy_quantities={},
             delivery_method="slow",
             liquidate={},
+            price_multipliers={},
+        )
+HISTORY_WINDOW = 7  # rolling window of past days to include in context
 def run_task(client, task_name):
                 model=MODEL_NAME,
                 messages=messages,
                 temperature=0.0,
+                max_completion_tokens=500,
                 stream=False,
             )
             response_text = completion.choices[0].message.content or ""
         action = parse_action(response_text)
+        print(f"Day {day}: buy={action.buy_quantities} delivery={action.delivery_method} liquidate={action.liquidate} prices={action.price_multipliers}")
         obs = env.step(action)
 def main():
+    from server.grader import grade, compute_baselines
     if not MODEL_NAME:
         raise RuntimeError("MODEL_NAME is not set. Please export MODEL_NAME before running inference.")
     client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    # print baseline for selected task
+    floor, ceiling = compute_baselines(TASK_NAME)
     print(f"\n{'=' * 50}")
+    print(f"BASELINE ({TASK_NAME}): floor=${floor:.2f} (passive) | ceiling=${ceiling:.2f} (heuristic)")
     print(f"{'=' * 50}")
+    profit = run_task(client, TASK_NAME)
+    score = grade(TASK_NAME, profit)
     print(f"\n{'=' * 50}")
+    print("FINAL SCORE")
     print(f"{'=' * 50}")
+    print(f"  {TASK_NAME}: {score:.3f} (profit: ${profit:.2f} | floor: ${floor:.2f} | ceiling: ${ceiling:.2f})")
 if __name__ == "__main__":
+    main()

models.py CHANGED Viewed

@@ -1,16 +1,18 @@
 from __future__ import annotations
 from openenv.core.env_server import Action, Observation, State
 from typing import Literal, Dict, List, Optional
 from pydantic import field_validator
 class InventoryAction(Action):
     buy_quantities : Dict[str, int] = {}
     delivery_method : Literal["slow", "medium", "fast"] = "slow"
     liquidate : Dict[str, int] = {}
-    @field_validator("buy_quantities", "liquidate", mode="before")
     @classmethod
     def parse_dict_strings(cls, v):
         if isinstance(v, str):

 from __future__ import annotations
+import json
 from openenv.core.env_server import Action, Observation, State
 from typing import Literal, Dict, List, Optional
 from pydantic import field_validator
 class InventoryAction(Action):
     buy_quantities : Dict[str, int] = {}
     delivery_method : Literal["slow", "medium", "fast"] = "slow"
     liquidate : Dict[str, int] = {}
+    price_multipliers : Dict[str, float] = {}  # product -> 0.5 to 1.5 (default 1.0)
+    @field_validator("buy_quantities", "liquidate", "price_multipliers", mode="before")
     @classmethod
     def parse_dict_strings(cls, v):
         if isinstance(v, str):

server/constants.py CHANGED Viewed

@@ -175,4 +175,12 @@ TASKS = {
             "toys": (8, 18),
         },
     },
 }

             "toys": (8, 18),
         },
     },
+}
+PRICE_ELASTICITY = {
+    "electronics": 1.2,
+    "clothing":    1.5,
+    "groceries":   0.4,
+    "furniture":   0.8,
+    "toys":        1.3,
 }

server/grader.py CHANGED Viewed

@@ -2,12 +2,17 @@
 Grader for inventory optimization tasks.
 Scores agent performance on a 0.0-1.0 scale using floor/ceiling approach.
   - floor: passive agent (no buys, just sells initial stock until empty)
-  - ceiling: heuristic agent (buys to meet average demand each day)
 """
 from server.inventory_env import InventoryEnvironment
 from models import InventoryAction
-from server.constants import TASKS, BASE_PRICES, COST_PRICES, SHIPPING_COST
 def _run_passive(task_name):
@@ -27,100 +32,46 @@ def _run_passive(task_name):
 def _run_heuristic(task_name):
-    """Ceiling baseline: smart heuristic that stocks up before events."""
     task = TASKS[task_name]
-    env = InventoryEnvironment(task_name)
-    obs = env.reset()
-    # track recent demand to adapt ordering
-    demand_history = {}
-    while not obs.done:
-        buy = {}
-        liquidate = {}
-        # determine nearest event distance
-        nearest_event_days = 999
-        for event, days in obs.updated_events.items():
-            if 0 < days < nearest_event_days:
-                nearest_event_days = days
-        # pick shipping based on urgency
-        if nearest_event_days <= 2:
-            delivery = "fast"
-        elif nearest_event_days <= 5:
-            delivery = "medium"
-        else:
-            delivery = "slow"
-        # update demand history from observation
-        if obs.demand_today:
-            for product, units in obs.demand_today.items():
-                if product not in demand_history:
-                    demand_history[product] = []
-                demand_history[product].append(units)
         for product, (lo, hi) in task["base_demand"].items():
-            avg_demand = (lo + hi) // 2
-            # use recent demand if available (last 5 days)
-            if product in demand_history and len(demand_history[product]) >= 2:
-                recent = demand_history[product][-5:]
-                avg_demand = max(avg_demand, int(sum(recent) / len(recent)))
-            current = sum(b[0] for b in obs.updated_inventory.get(product, []))
-            # count in-transit units
-            in_transit = 0
-            for d in obs.updated_deliveries:
-                for p, shipment in d.items():
-                    if p == product:
-                        in_transit += shipment[0]
-            available = current + in_transit
-            # how many days of stock to target
-            if nearest_event_days <= 5:
-                target = avg_demand * 6
-            else:
-                target = avg_demand * 4
-            # prioritize high-margin products — order more aggressively
-            margin = BASE_PRICES[product] - COST_PRICES[product]
-            if margin >= 50:  # electronics, furniture
-                target = int(target * 1.3)
-            if available < target:
-                buy[product] = target - available
-        # liquidate groceries about to expire (1 day left)
-        for batch in obs.updated_inventory.get("groceries", []):
-            if batch[1] is not None and batch[1] <= 1:
-                liquidate["groceries"] = liquidate.get("groceries", 0) + batch[0]
-        # stop buying when deliveries can't arrive in time
-        days_left = task["max_days"] - obs.current_day
-        if delivery == "slow" and days_left <= 5:
-            buy = {}
-        elif delivery == "medium" and days_left <= 3:
-            buy = {}
-        elif delivery == "fast" and days_left <= 1:
-            buy = {}
-        # don't buy more than cash allows (rough check)
-        total_cost = sum(qty * (COST_PRICES[p] + SHIPPING_COST[delivery]) for p, qty in buy.items())
-        if total_cost > obs.total_cash * 0.85:
-            scale = (obs.total_cash * 0.85) / total_cost if total_cost > 0 else 0
-            buy = {p: max(1, int(qty * scale)) for p, qty in buy.items()}
-        action = InventoryAction(
-            buy_quantities=buy,
-            delivery_method=delivery,
-            liquidate=liquidate,
-        )
-        obs = env.step(action)
-    return obs.total_profit
 def compute_baselines(task_name):

 Grader for inventory optimization tasks.
 Scores agent performance on a 0.0-1.0 scale using floor/ceiling approach.
   - floor: passive agent (no buys, just sells initial stock until empty)
+  - ceiling: theoretical max profit with perfect demand knowledge
 """
 from server.inventory_env import InventoryEnvironment
 from models import InventoryAction
+from server.constants import (
+    TASKS, BASE_PRICES, COST_PRICES, SHIPPING_COST, EVENT_EFFECTS,
+    WEEKEND_MULTIPLIER, EVENT_DURATION,
+)
+import random
 def _run_passive(task_name):
 def _run_heuristic(task_name):
     task = TASKS[task_name]
+    events = dict(task["events"])
+    total_demand = {p: 0 for p in task["base_demand"]}
+    for day in range(1, task["max_days"] + 1):
+        # tick events
+        for event_name in events:
+            events[event_name] -= 1
+        rng = random.Random(task["seed"] * 1000 + day)
         for product, (lo, hi) in task["base_demand"].items():
+            demand = rng.randint(lo, hi)
+            # weekend boost
+            if day % 7 == 5 or day % 7 == 6:
+                demand = int(WEEKEND_MULTIPLIER * demand)
+            # event multipliers
+            for event_name, days_left in events.items():
+                if -EVENT_DURATION < days_left <= 0 and event_name in EVENT_EFFECTS:
+                    mult = EVENT_EFFECTS[event_name].get(product, 1.0)
+                    demand = int(demand * mult)
+            total_demand[product] += demand
+    total_profit = 0.0
+    # sell the initial stock first
+    initial_stock = task["initial_stock"]
+    for product in task["base_demand"]:
+        total_profit += min(initial_stock.get(product, 0), total_demand[product]) * BASE_PRICES[product]
+        total_demand[product] = max(0, total_demand[product] - initial_stock.get(product, 0))
+        # cost price and shipping cost applies after initial stock
+        total_profit += total_demand[product] * (BASE_PRICES[product] - COST_PRICES[product] - SHIPPING_COST["slow"])
+    return total_profit
 def compute_baselines(task_name):

server/inventory_env.py CHANGED Viewed

@@ -8,7 +8,7 @@ from .constants import (
     INITIAL_CASH, BASE_PRICES, COST_PRICES, SHELF_LIFE, INITIAL_STOCK,
     EVENTS, SHIPPING_COST, SHIPPING_DAYS, INVENTORY_CAPACITY,
     EXTRA_INVENTORY_COST, BASE_DEMAND, WEEKEND_MULTIPLIER, EVENT_EFFECTS,
-    EVENT_DURATION, MAX_DAYS, UPGRADE_DELIVERY_COST, TASKS,
 )
@@ -128,28 +128,45 @@ class InventoryEnvironment(Environment):
             day_cost += total_cost
             arrival_day = self.current_day + SHIPPING_DAYS[action.delivery_method]
             self.deliveries.append({product: [qty, arrival_day]})
         # 5. generate demand
         demand = self._generate_demand()
         # 6. sell products (fifo)
         for product, demand_today in demand.items():
             product_availability = sum(batch[0] for batch in self.inventory[product])
             if demand_today > product_availability:
                 missed_sales = demand_today - product_availability
                 sold = product_availability
-                day_revenue += sold * BASE_PRICES[product]
                 self.inventory[product] = []
-                self.reward -= missed_sales * BASE_PRICES[product] * 0.001
-                self.reward += sold * BASE_PRICES[product] * 0.001
             else:
-                day_revenue += demand_today * BASE_PRICES[product]
-                self.reward += demand_today * BASE_PRICES[product] * 0.001
                 new_batches = []
@@ -162,7 +179,9 @@ class InventoryEnvironment(Environment):
                         new_batches.append(batch)
                     else:
-                        new_batches.append([batch[0] - demand_today, batch[1]])
                         demand_today = 0
                 self.inventory[product] = new_batches

     INITIAL_CASH, BASE_PRICES, COST_PRICES, SHELF_LIFE, INITIAL_STOCK,
     EVENTS, SHIPPING_COST, SHIPPING_DAYS, INVENTORY_CAPACITY,
     EXTRA_INVENTORY_COST, BASE_DEMAND, WEEKEND_MULTIPLIER, EVENT_EFFECTS,
+    EVENT_DURATION, MAX_DAYS, UPGRADE_DELIVERY_COST, TASKS, PRICE_ELASTICITY
 )
             day_cost += total_cost
             arrival_day = self.current_day + SHIPPING_DAYS[action.delivery_method]
+            # add jitter: slow ±2 days, medium ±1 day, fast is reliable
+            jitter_rng = random.Random(self.seed * 2000 + self.current_day * 100 + hash(product))
+            if action.delivery_method == "slow":
+                arrival_day += jitter_rng.randint(-2, 2)
+            elif action.delivery_method == "medium":
+                arrival_day += jitter_rng.randint(-1, 1)
+            # ensure arrival is at least next day
+            arrival_day = max(self.current_day + 1, arrival_day)
             self.deliveries.append({product: [qty, arrival_day]})
         # 5. generate demand
         demand = self._generate_demand()
+        # apply price elasticity: demand scales with price^(-elasticity)
+        price_mults = {}
+        for product in demand:
+            pm = max(0.5, min(1.5, action.price_multipliers.get(product, 1.0)))
+            price_mults[product] = pm
+            e = PRICE_ELASTICITY[product]
+            demand[product] = max(0, int(demand[product] * pm ** -e))
         # 6. sell products (fifo)
         for product, demand_today in demand.items():
+            sell_price = BASE_PRICES[product] * price_mults[product]
             product_availability = sum(batch[0] for batch in self.inventory[product])
             if demand_today > product_availability:
                 missed_sales = demand_today - product_availability
                 sold = product_availability
+                day_revenue += sold * sell_price
                 self.inventory[product] = []
+                self.reward -= missed_sales * sell_price * 0.001
+                self.reward += sold * sell_price * 0.001
             else:
+                day_revenue += demand_today * sell_price
+                self.reward += demand_today * sell_price * 0.001
                 new_batches = []
                         new_batches.append(batch)
                     else:
+                        remaining = batch[0] - demand_today
+                        if remaining > 0:
+                            new_batches.append([remaining, batch[1]])
                         demand_today = 0
                 self.inventory[product] = new_batches