Spaces:

saketh1201
/

inventory_env

Sleeping

App Files Files Community

saketh1201 commited on Mar 30

Commit

ab7c428

verified ·

1 Parent(s): 2923e9c

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

Dockerfile +29 -0
README.md +149 -10
__init__.py +0 -0
client.py +67 -0
curl +0 -0
inference.py +228 -0
models.py +29 -0
openenv.yaml +6 -0
pyproject.toml +18 -0
server/__init__.py +0 -0
server/app.py +14 -0
server/constants.py +178 -0
server/grader.py +135 -0
server/inventory_env.py +246 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+FROM ghcr.io/meta-pytorch/openenv-base:latest AS builder
+RUN apt-get update && apt-get install -y git curl && \
+    curl -LsSf https://astral.sh/uv/install.sh | sh
+ENV PATH="/root/.local/bin:$PATH"
+WORKDIR /app
+COPY pyproject.toml uv.lock* ./
+RUN uv sync --frozen || uv sync
+COPY . .
+RUN uv sync
+FROM ghcr.io/meta-pytorch/openenv-base:latest
+WORKDIR /app
+COPY --from=builder /app/.venv /app/.venv
+COPY --from=builder /app /app
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONPATH="/app:$PYTHONPATH"
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=3s \
+    CMD curl -f http://localhost:8000/health || exit 1
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,149 @@
----
-title: Inventory Env
-emoji: 🔥
-colorFrom: purple
-colorTo: blue
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Inventory Optimization Environment
+emoji: 📦
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 8000
+tags:
+  - openenv
+base_path: /web
+---
+# Retail Inventory Optimization Environment
+An OpenEnv reinforcement learning environment that simulates day-by-day retail inventory management across 5 product categories. An AI agent must decide what to buy, how to ship, and what to liquidate to maximize profit over a 30-day episode.
+## Environment Description
+You manage a retail store selling 5 products with different characteristics:
+| Product | Sell Price | Cost Price | Profit Margin | Shelf Life |
+|---------|-----------|------------|---------------|------------|
+| Electronics | $150 | $100 | $50 | No expiry |
+| Clothing | $40 | $25 | $15 | No expiry |
+| Groceries | $10 | $5 | $5 | 5 days |
+| Furniture | $200 | $130 | $70 | No expiry |
+| Toys | $25 | $12 | $13 | No expiry |
+Each day, customer demand is generated (with weekend boosts and event spikes). The agent must keep stock levels high enough to meet demand while managing cash flow, shipping delays, warehouse capacity, and perishable goods.
+## Action Space
+```python
+class InventoryAction(Action):
+    buy_quantities: Dict[str, int] = {}        # product -> quantity to order
+    delivery_method: Literal["slow", "medium", "fast"] = "slow"
+    liquidate: Dict[str, int] = {}             # product -> quantity to dispose
+```
+| Field | Description |
+|-------|-------------|
+| `buy_quantities` | Products and amounts to order. Empty `{}` to skip buying. |
+| `delivery_method` | `"slow"` ($2/unit, 5 days), `"medium"` ($5/unit, 3 days), `"fast"` ($10/unit, 1 day) |
+| `liquidate` | Products and amounts to dispose of (no revenue). Use for expiring groceries or freeing warehouse space. |
+## Observation Space
+```python
+class InventoryObservation(Observation):
+    current_day: int
+    total_cash: float
+    day_profit: float
+    total_profit: float
+    demand_today: Dict[str, int]
+    updated_inventory: Dict[str, List[List[Optional[int]]]]  # [[qty, days_left], ...]
+    remaining_capacity: Dict[str, int]
+    updated_events: Dict[str, int]
+    updated_deliveries: List[Dict[str, List[int]]]
+```
+The inventory uses a batch format with FIFO selling: `{"groceries": [[20, 3], [10, 5]]}` means 20 units expiring in 3 days and 10 units expiring in 5 days.
+## Tasks (Easy / Medium / Hard)
+### Easy — "Steady State"
+- Low starting stock, low steady demand, no events
+- Starting cash: $1,000 | Full warehouse capacity
+- Agent needs to restock regularly but demand is predictable
+### Medium — "Seasonal Rush"
+- Default stock/cash, all 5 events spread across 30 days
+- Events: Black Friday (day 6), Christmas (day 12), Back to School (day 18), Summer Clearance (day 24), New Competitor (day 28)
+- Agent must anticipate demand spikes and restock accordingly
+### Hard — "Chaos Mode"
+- Half starting cash ($500), low stock, events packed close together
+- Higher demand, smaller warehouse capacity
+- Agent must balance tight budget, overlapping event spikes, and fast-expiring groceries
+## Reward Function
+Per-step reward based on multiple signals:
+- **Successful sales**: `+sold_units * sell_price * 0.001` (proportional to revenue)
+- **Missed sales**: `-missed_units * sell_price * 0.001` (proportional to lost revenue)
+- **Expired groceries**: `-0.05 * expired_count`
+- **Failed purchases**: `-0.5` per order that exceeds available cash
+- **Liquidation loss**: `-liquidated_value * 0.001` (proportional to cost of disposed stock)
+## Grading (0.0 - 1.0)
+Each task is scored by comparing agent profit against two baselines:
+- **Floor**: Passive agent that never buys (sells initial stock until empty)
+- **Ceiling**: Smart heuristic that restocks based on demand and events
+```
+score = clamp((agent_profit - floor) / (ceiling - floor), 0.0, 1.0)
+```
+## Setup
+```bash
+# Install dependencies
+pip install openenv-core[core] fastapi uvicorn pydantic openai numpy
+# Run grader baselines
+python -c "from server.grader import compute_baselines; [print(f'{t}: floor={f:.2f}, ceiling={c:.2f}') for t in ['easy','medium','hard'] for f,c in [compute_baselines(t)]]"
+# Start server locally
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Test endpoints
+curl http://localhost:8000/health
+curl -X POST http://localhost:8000/reset
+```
+## Running Inference
+```bash
+export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
+export HF_TOKEN="your-token"
+python inference.py
+```
+## Docker
+```bash
+docker build -t inventory-env .
+docker run -p 8000:8000 inventory-env
+```
+## Project Structure
+```
+V2/
+├── models.py              # InventoryAction, InventoryObservation, InventoryState
+├── client.py              # EnvClient for remote WebSocket connections
+├── inference.py           # LLM inference script (runs all 3 tasks)
+├── openenv.yaml           # OpenEnv spec manifest
+├── pyproject.toml         # Python dependencies
+├── Dockerfile             # Container build
+├── server/
+│   ├── app.py             # FastAPI server (create_app)
+│   ├── inventory_env.py   # Environment (reset, step, state)
+│   ├── constants.py       # Prices, stock, events, task configs
+│   └── grader.py          # Floor/ceiling baselines and scoring
+└── scripts/
+    └── validate-submission.sh  # Pre-submission validator
+```

__init__.py ADDED Viewed

File without changes

client.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from __future__ import annotations
+from typing import Any, Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from models import InventoryAction, InventoryObservation, InventoryState
+class InventoryEnv(EnvClient[InventoryAction, InventoryObservation, InventoryState]):
+    def _step_payload(self, action : InventoryAction) -> Dict[str, Any]:
+        payload: Dict[str, Any] = {}
+        if action.buy_quantities is not None:
+            payload["buy_quantities"] = action.buy_quantities
+        if action.delivery_method is not None:
+            payload["delivery_method"] = action.delivery_method
+        if action.upgrade_delivery is not None:
+            payload["upgrade_delivery"] = action.upgrade_delivery
+        if action.liquidate is not None:
+            payload["liquidate"] = action.liquidate
+        return payload
+    def _parse_result(self, payload: Dict) -> StepResult[InventoryObservation]:
+        obs_data = payload.get("observation", {})
+        observation = InventoryObservation(
+            current_day = obs_data.get("current_day", 0),
+            total_cash = obs_data.get("total_cash", 0),
+            day_profit = obs_data.get("day_profit", 0),
+            total_profit = obs_data.get("total_profit", 0),
+            demand_today = obs_data.get("demand_today", {}),
+            updated_inventory = obs_data.get("updated_inventory", {}),
+            updated_events = obs_data.get("updated_events", {}),
+            updated_deliveries = obs_data.get("updated_deliveries", []),
+            done = obs_data.get("done", False),
+            reward = obs_data.get("reward", 0.0),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation = observation,
+            reward = observation.reward,
+            done = observation.done,
+        )
+    def _parse_state(self, payload: Dict[str, Any]) -> InventoryState:
+        return InventoryState(
+            episode_id = payload.get("episode_id", ""),
+            current_day = payload.get("current_day", 0),
+            cash = payload.get("cash", 0.0),
+            inventory = payload.get("inventory", {}),
+        )

curl ADDED Viewed

File without changes

inference.py ADDED Viewed

	@@ -0,0 +1,228 @@

+"""
+Inference Script — Inventory Optimization Environment
+=======================================================
+Required env vars:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+"""
+import os
+import json
+import textwrap
+from openai import OpenAI
+from server.inventory_env import InventoryEnvironment
+from server.constants import EXTRA_INVENTORY_COST
+from models import InventoryAction
+from dotenv import load_dotenv
+load_dotenv()
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+MODEL_NAME = os.getenv("MODEL_NAME")
+MAX_DAYS = 30
+SYSTEM_PROMPT = textwrap.dedent("""
+You are an inventory management AI agent. Each day you receive the current state
+    of a retail store with 5 products: electronics, clothing, groceries, furniture, toys.
+    Groceries are perishable (5-day shelf life). Other products don't expire.
+    Product selling prices: electronics=$150, clothing=$40, groceries=$10, furniture=$200, toys=$25
+    Product cost prices: electronics=$100, clothing=$25, groceries=$5, furniture=$130, toys=$12
+    Profit margins: electronics=$50, clothing=$15, groceries=$5, furniture=$70, toys=$13
+    Shipping costs per unit: slow=$2 (5 days), medium=$5 (3 days), fast=$10 (1 day)
+    Warehouse capacity: electronics=100, clothing=200, groceries=500, furniture=50, toys=300
+    Events (like black_friday, christmas) boost demand when their countdown hits 0.
+    Weekends (day%7 == 5 or 6) have 1.2x demand.
+    CRITICAL STRATEGY:
+    - You MUST restock products when inventory is low. If you don't buy, you run out of
+      stock and miss sales. Missed sales = lost revenue = negative reward.
+    - Check today's demand to estimate tomorrow's needs.
+    - Do NOT overbuy when demand is low - unsold stock ties up cash, warehouse space and perishables expire.
+    - Prioritize high-margin products: furniture ($70 profit), electronics ($50 profit).
+    - Stock up BEFORE events hit (check event countdowns).
+    Each day you must respond with a JSON action:
+    {
+        "buy_quantities": {"product_name": quantity, ...},
+        "delivery_method": "slow" | "medium" | "fast",
+        "liquidate": {"product_name": quantity, ...}
+    }
+    - buy_quantities: products and amounts to order.
+    - delivery_method: shipping speed for this order
+    - liquidate: products and amounts to dispose of (no revenue, empty {} to skip)
+      Use liquidate to free up warehouse space before a restock.
+    You will see what demand occurred today AFTER it happened. Use this to spot trends
+    and plan restocking. A negative reward means your last action was bad — adjust.
+    Do NOT buy more than you can afford. Do NOT buy on the last day.
+    Respond with ONLY valid JSON, no explanation.
+""").strip()
+def format_observation(obs):
+    """Convert observation into a readable prompt for the LLM."""
+    # format inventory with batch detail, remaining capacity, and extra cost
+    inv_lines = []
+    for product, batches in obs.updated_inventory.items():
+        total = sum(b[0] for b in batches)
+        remaining = obs.remaining_capacity.get(product, 0)
+        extra_cost = EXTRA_INVENTORY_COST.get(product, 0)
+        batch_detail = ", ".join(
+            f"{b[0]} units" + (f" ({b[1]}d left)" if b[1] is not None else "")
+            for b in batches
+        )
+        inv_lines.append(f"  {product}: {total} total [{batch_detail}] | space left: {remaining} (extra space: ${extra_cost}/unit)")
+    inv_text = "\n".join(inv_lines)
+    # format events
+    event_lines = []
+    for event, days in obs.updated_events.items():
+        if days > 0:
+            event_lines.append(f"  {event}: in {days} days")
+        else:
+            event_lines.append(f"  {event}: ACTIVE NOW")
+    events_text = "\n".join(event_lines) if event_lines else "  None"
+    # format deliveries
+    delivery_lines = []
+    for delivery in obs.updated_deliveries:
+        for product, shipment in delivery.items():
+            qty, arrival_day = shipment
+            days_away = arrival_day - obs.current_day
+            delivery_lines.append(f"  {product}: {qty} units arriving in {days_away} days")
+    deliveries_text = "\n".join(delivery_lines) if delivery_lines else "  None"
+    # format demand (already happened today — feedback, not prediction)
+    demand_lines = []
+    for product, units in obs.demand_today.items():
+        demand_lines.append(f"  {product}: {units} units")
+    demand_text = "\n".join(demand_lines) if demand_lines else "  No demand data yet"
+    prompt = f"""Day: {obs.current_day}/{MAX_DAYS}
+Cash: ${obs.total_cash:.2f}
+Day Profit: ${obs.day_profit:.2f}
+Total Profit: ${obs.total_profit:.2f}
+Last Step Reward: {obs.reward:.3f}
+Inventory:
+{inv_text}
+Demand That Occurred Today:
+{demand_text}
+Upcoming Events:
+{events_text}
+Pending Deliveries:
+{deliveries_text}
+Respond with your action as JSON."""
+    return prompt
+def parse_action(response_text):
+    """Parse LLM response into InventoryAction."""
+    try:
+        text = response_text.strip()
+        if text.startswith("```"):
+            text = text.split("\n", 1)[1]
+            text = text.rsplit("```", 1)[0]
+        data = json.loads(text)
+        return InventoryAction(**data)
+    except Exception:
+        print(response_text)
+        return InventoryAction(
+            buy_quantities={},
+            delivery_method="slow",
+            liquidate={},
+        )
+def run_task(client, task_name):
+    """Run a single task and return total profit."""
+    env = InventoryEnvironment(task_name)
+    obs = env.reset()
+    print(f"\n{'=' * 50}")
+    print(f"Task: {task_name.upper()} | Cash: ${obs.total_cash:.2f} | Days: {env.max_days}")
+    print(f"{'=' * 50}")
+    for day in range(1, env.max_days + 1):
+        if obs.done:
+            print("Episode ended early.")
+            break
+        user_prompt = format_observation(obs)
+        messages = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": user_prompt},
+        ]
+        try:
+            completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=messages,
+                # temperature=0.2,
+                max_completion_tokens=300,
+                stream=False,
+            )
+            response_text = completion.choices[0].message.content or ""
+        except Exception as exc:
+            print(f"  LLM request failed: {exc}. Skipping turn.")
+            response_text = "{}"
+        action = parse_action(response_text)
+        print(f"Day {day}: buy={action.buy_quantities} delivery={action.delivery_method} liquidate={action.liquidate}")
+        obs = env.step(action)
+        print(f"  Cash: ${obs.total_cash:.2f} | Day Profit: ${obs.day_profit:.2f} | Reward: {obs.reward:.3f}")
+    print(f"Task {task_name} complete | Total profit: ${obs.total_profit:.2f}")
+    return obs.total_profit
+def main():
+    from server.grader import grade_all, compute_baselines
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    # print baselines first
+    print(f"\n{'=' * 50}")
+    print("BASELINES")
+    print(f"{'=' * 50}")
+    for task_name in ["easy", "medium", "hard"]:
+        floor, ceiling = compute_baselines(task_name)
+        print(f"  {task_name}: floor=${floor:.2f} (passive) | ceiling=${ceiling:.2f} (heuristic)")
+    results = {}
+    for task_name in ["easy", "medium", "hard"]:
+        profit = run_task(client, task_name)
+        results[task_name] = profit
+    scores = grade_all(results)
+    print(f"\n{'=' * 50}")
+    print("FINAL SCORES")
+    print(f"{'=' * 50}")
+    for task_name, score in scores.items():
+        floor, ceiling = compute_baselines(task_name)
+        print(f"  {task_name}: {score:.3f} (profit: ${results[task_name]:.2f} | floor: ${floor:.2f} | ceiling: ${ceiling:.2f})")
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,29 @@

+from __future__ import annotations
+from openenv.core.env_server import Action, Observation, State
+from typing import Literal, Dict, List, Optional
+class InventoryAction(Action):
+    buy_quantities : Dict[str, int] = {}
+    delivery_method : Literal["slow", "medium", "fast"] = "slow"
+    liquidate : Dict[str, int] = {}
+class InventoryObservation(Observation):
+    current_day : int
+    total_cash : float
+    day_profit : float
+    total_profit : float
+    demand_today : Dict[str, int]  # product -> units demanded today
+    updated_inventory : Dict[str, List[List[Optional[int]]]]  # product -> [[qty, days_left], ...] per batch
+    remaining_capacity : Dict[str, int]  # product -> remaining warehouse space
+    updated_events : Dict[str, int]
+    updated_deliveries : List[Dict[str, List[int]]] # product name, (quantity of product, days to arrival)
+class InventoryState(State):
+    episode_id : str
+    current_day : int
+    cash : float
+    inventory : Dict[str, int]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: inventory_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,18 @@

+[project]
+name = "inventory-env"
+version = "0.1.0"
+description = "Retail Inventory Optimization RL Environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core[core]>=0.2.0",
+    "fastapi>=0.115.0",
+    "uvicorn>=0.24.0",
+    "pydantic>=2.0.0",
+    "numpy>=1.24.0",
+    "openai>=1.0.0",
+    "python-dotenv>=1.0.0",
+]
+[build-system]
+requires = ["setuptools"]
+build-backend = "setuptools.backends._legacy:_Backend"

server/__init__.py ADDED Viewed

File without changes

server/app.py ADDED Viewed

	@@ -0,0 +1,14 @@

+from openenv.core.env_server import create_app
+from server.inventory_env import InventoryEnvironment
+from models import InventoryAction, InventoryObservation
+app = create_app(InventoryEnvironment, InventoryAction, InventoryObservation, env_name="inventory_env")
+def main():
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/constants.py ADDED Viewed

	@@ -0,0 +1,178 @@

+INITIAL_CASH = 1000.0
+# Product name -> base price (selling price before multiplier)
+BASE_PRICES = {
+    "electronics": 150.0,
+    "clothing": 40.0,
+    "groceries": 10.0,
+    "furniture": 200.0,
+    "toys": 25.0,
+}
+# Product name -> cost price (what you pay to buy stock)
+COST_PRICES = {
+    "electronics": 100.0,
+    "clothing": 25.0,
+    "groceries": 5.0,
+    "furniture": 130.0,
+    "toys": 12.0,
+}
+# Product name -> shelf life in days (None = no expiry)
+SHELF_LIFE = {
+    "electronics": None,
+    "clothing": None,
+    "groceries": 5,
+    "furniture": None,
+    "toys": None,
+}
+# Product name -> starting stock quantity
+INITIAL_STOCK = {
+    "electronics": 10,
+    "clothing": 20,
+    "groceries": 50,
+    "furniture": 5,
+    "toys": 30,
+}
+# Delivery method -> cost per unit
+SHIPPING_COST = {
+    "slow": 2.0,
+    "medium": 5.0,
+    "fast": 10.0,
+}
+# Delivery method -> days to arrive
+SHIPPING_DAYS = {
+    "slow": 5,
+    "medium": 3,
+    "fast": 1,
+}
+# Event name -> days until event (spread across 30 days)
+EVENTS = {
+    "black_friday": 6,
+    "christmas": 12,
+    "back_to_school": 18,
+    "summer_clearance": 24,
+    "new_competitor": 28,
+}
+# Product name -> max inventory space (units)
+INVENTORY_CAPACITY = {
+    "electronics": 100,
+    "clothing": 200,
+    "groceries": 500,
+    "furniture": 50,
+    "toys": 300,
+}
+# Product name -> additional cost per unit for extra inventory beyond capacity
+EXTRA_INVENTORY_COST = {
+    "electronics": 20.0,
+    "clothing": 5.0,
+    "groceries": 2.0,
+    "furniture": 30.0,
+    "toys": 4.0,
+}
+# Product name -> (min_demand, max_demand) per day
+BASE_DEMAND = {
+    "electronics": (3, 8),
+    "clothing": (5, 15),
+    "groceries": (20, 40),
+    "furniture": (1, 3),
+    "toys": (5, 12),
+}
+WEEKEND_MULTIPLIER = 1.2
+# Event name -> {product: demand_multiplier} when event triggers
+EVENT_EFFECTS = {
+    "black_friday": {"electronics": 3.0, "clothing": 2.5, "toys": 2.0, "furniture": 1.5, "groceries": 1.0},
+    "christmas": {"toys": 3.0, "electronics": 2.0, "clothing": 1.5, "furniture": 1.0, "groceries": 1.5},
+    "back_to_school": {"clothing": 2.5, "electronics": 1.5, "toys": 1.5, "furniture": 1.0, "groceries": 1.0},
+    "summer_clearance": {"clothing": 2.0, "toys": 1.5, "electronics": 1.0, "furniture": 1.5, "groceries": 1.0},
+    "new_competitor": {"electronics": 0.6, "clothing": 0.7, "toys": 0.7, "furniture": 0.8, "groceries": 0.9},
+}
+EVENT_DURATION = 2
+MAX_DAYS = 30
+UPGRADE_DELIVERY_COST = 50.0
+# Task configs for easy/medium/hard
+TASKS = {
+    # Easy: High starting stock, low demand, no events, full warehouse capacity.
+    # Agent just needs to maintain stock and sell. Minimal challenge.
+    "easy": {
+        "seed": 100,
+        "max_days": 30,
+        "initial_cash": 1000.0,
+        "events": {},  # no events
+        "initial_stock": {
+            "electronics": 5,
+            "clothing": 10,
+            "groceries": 20,
+            "furniture": 3,
+            "toys": 10,
+        },
+        "inventory_capacity": INVENTORY_CAPACITY,
+        "base_demand": {
+            "electronics": (2, 5),
+            "clothing": (3, 10),
+            "groceries": (15, 30),
+            "furniture": (1, 2),
+            "toys": (3, 8),
+        },
+    },
+    # Medium: Default stock/cash, all 5 events spread across 30 days, normal demand.
+    # Agent must anticipate demand spikes from events and restock accordingly.
+    "medium": {
+        "seed": 200,
+        "max_days": 30,
+        "initial_cash": 1000.0,
+        "events": EVENTS,
+        "initial_stock": INITIAL_STOCK,
+        "inventory_capacity": INVENTORY_CAPACITY,
+        "base_demand": BASE_DEMAND,
+    },
+    # Hard: Half starting cash ($500), low stock, events packed close together,
+    # higher demand, smaller warehouse. Agent must balance tight budget,
+    # overlapping event spikes, and fast-expiring groceries.
+    "hard": {
+        "seed": 300,
+        "max_days": 30,
+        "initial_cash": 500.0,
+        "events": {
+            "black_friday": 4,
+            "christmas": 8,
+            "back_to_school": 12,
+            "summer_clearance": 16,
+            "new_competitor": 20,
+        },
+        "initial_stock": {
+            "electronics": 5,
+            "clothing": 10,
+            "groceries": 30,
+            "furniture": 3,
+            "toys": 15,
+        },
+        "inventory_capacity": {
+            "electronics": 50,
+            "clothing": 100,
+            "groceries": 250,
+            "furniture": 25,
+            "toys": 150,
+        },
+        "base_demand": {
+            "electronics": (5, 12),
+            "clothing": (8, 20),
+            "groceries": (30, 60),
+            "furniture": (2, 5),
+            "toys": (8, 18),
+        },
+    },
+}

server/grader.py ADDED Viewed

	@@ -0,0 +1,135 @@

+"""
+Grader for inventory optimization tasks.
+Scores agent performance on a 0.0-1.0 scale using floor/ceiling approach.
+  - floor: passive agent (no buys, just sells initial stock until empty)
+  - ceiling: heuristic agent (buys to meet average demand each day)
+"""
+from server.inventory_env import InventoryEnvironment
+from models import InventoryAction
+from server.constants import TASKS, BASE_PRICES, COST_PRICES, SHIPPING_COST
+def _run_passive(task_name):
+    """Floor baseline: do nothing, just sell whatever initial stock covers."""
+    env = InventoryEnvironment(task_name)
+    obs = env.reset()
+    while not obs.done:
+        action = InventoryAction(
+            buy_quantities={},
+            delivery_method="slow",
+            liquidate={},
+        )
+        obs = env.step(action)
+    return obs.total_profit
+def _run_heuristic(task_name):
+    """Ceiling baseline: smart heuristic that stocks up before events."""
+    task = TASKS[task_name]
+    env = InventoryEnvironment(task_name)
+    obs = env.reset()
+    while not obs.done:
+        buy = {}
+        delivery = "medium"
+        liquidate = {}
+        # check if any event is imminent (within 3 days)
+        event_soon = False
+        for event, days in obs.updated_events.items():
+            if 0 < days <= 3:
+                event_soon = True
+                break
+        for product, (lo, hi) in task["base_demand"].items():
+            avg_demand = (lo + hi) // 2
+            current = sum(b[0] for b in obs.updated_inventory.get(product, []))
+            if event_soon:
+                # stock up 5 days' worth before events, use fast shipping
+                target = avg_demand * 5
+                delivery = "fast"
+            else:
+                # normal: keep 3 days' buffer
+                target = avg_demand * 3
+            if current < target:
+                buy[product] = target - current
+        # liquidate groceries about to expire (1 day left)
+        for batch in obs.updated_inventory.get("groceries", []):
+            if batch[1] is not None and batch[1] <= 1:
+                liquidate["groceries"] = liquidate.get("groceries", 0) + batch[0]
+        # don't buy on last 2 days
+        if obs.current_day >= task["max_days"] - 2:
+            buy = {}
+        # don't buy more than cash allows (rough check)
+        total_cost = sum(qty * (COST_PRICES[p] + SHIPPING_COST[delivery]) for p, qty in buy.items())
+        if total_cost > obs.total_cash * 0.8:
+            # scale down proportionally
+            scale = (obs.total_cash * 0.8) / total_cost if total_cost > 0 else 0
+            buy = {p: max(1, int(qty * scale)) for p, qty in buy.items()}
+        action = InventoryAction(
+            buy_quantities=buy,
+            delivery_method=delivery,
+            liquidate=liquidate,
+        )
+        obs = env.step(action)
+    return obs.total_profit
+def compute_baselines(task_name):
+    """Pre-compute floor and ceiling for a task."""
+    floor = _run_passive(task_name)
+    ceiling = _run_heuristic(task_name)
+    return floor, ceiling
+def grade(task_name, agent_profit):
+    """
+    Grade agent performance on 0.0-1.0 scale.
+    Args:
+        task_name: "easy", "medium", or "hard"
+        agent_profit: total profit achieved by the agent
+    Returns:
+        float score between 0.0 and 1.0
+    """
+    floor, ceiling = compute_baselines(task_name)
+    if ceiling <= floor:
+        return 1.0 if agent_profit >= ceiling else 0.0
+    score = (agent_profit - floor) / (ceiling - floor)
+    return max(0.0, min(1.0, score))
+def grade_all(results):
+    """
+    Grade all 3 tasks.
+    Args:
+        results: dict of {task_name: agent_profit}
+    Returns:
+        dict of {task_name: score}
+    """
+    scores = {}
+    for task_name, agent_profit in results.items():
+        scores[task_name] = grade(task_name, agent_profit)
+    return scores
+if __name__ == "__main__":
+    print("Computing baselines for all tasks...")
+    for task_name in ["easy", "medium", "hard"]:
+        floor, ceiling = compute_baselines(task_name)
+        print(f"  {task_name}: floor={floor:.2f}, ceiling={ceiling:.2f}")

server/inventory_env.py ADDED Viewed

	@@ -0,0 +1,246 @@

+from openenv.core.env_server.interfaces import Environment
+import copy
+import random
+from uuid import uuid4
+from models import InventoryAction, InventoryObservation, InventoryState
+from .constants import (
+    INITIAL_CASH, BASE_PRICES, COST_PRICES, SHELF_LIFE, INITIAL_STOCK,
+    EVENTS, SHIPPING_COST, SHIPPING_DAYS, INVENTORY_CAPACITY,
+    EXTRA_INVENTORY_COST, BASE_DEMAND, WEEKEND_MULTIPLIER, EVENT_EFFECTS,
+    EVENT_DURATION, MAX_DAYS, UPGRADE_DELIVERY_COST, TASKS,
+)
+def _build_inventory(stock):
+    """Convert stock dict to batch format: {product: [[qty, days_left], ...]}"""
+    inv = {}
+    for product, qty in stock.items():
+        shelf = SHELF_LIFE[product]
+        inv[product] = [[qty, shelf]]
+    return inv
+class InventoryEnvironment(Environment):
+    def __init__(self, task_name="medium"):
+        self.task_name = task_name
+        self.task = TASKS[task_name]
+        self.cash = self.task["initial_cash"]
+        self.inventory = _build_inventory(self.task["initial_stock"])
+        self.events = copy.deepcopy(self.task["events"])
+        self.deliveries = []
+        self.current_day = 0
+        self.total_profit = 0.0
+        self.seed = self.task["seed"]
+        self.reward = 0.0
+        self.max_days = self.task["max_days"]
+        self.inventory_capacity = self.task["inventory_capacity"]
+        self.base_demand = self.task["base_demand"]
+        self.reset()
+    def reset(self, seed: int = None) -> InventoryObservation:
+        if seed is not None:
+            self.seed = seed
+        else:
+            self.seed = self.task["seed"]
+        self.cash = self.task["initial_cash"]
+        self.inventory = _build_inventory(self.task["initial_stock"])
+        self.events = copy.deepcopy(self.task["events"])
+        self.deliveries = []
+        self.current_day = 0
+        self.total_profit = 0.0
+        self.reward = 0.0
+        self._state = InventoryState(
+            episode_id = str(uuid4()),
+            current_day = 0,
+            cash = self.task["initial_cash"],
+            inventory = dict(self.task["initial_stock"])
+        )
+        return InventoryObservation(
+            current_day = 0,
+            total_cash = self.cash,
+            day_profit = 0.0,
+            total_profit = 0.0,
+            demand_today = {},
+            updated_inventory = copy.deepcopy(self.inventory),
+            remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
+            updated_events = copy.deepcopy(self.events),
+            updated_deliveries = [],
+            reward = 0.0,
+            done = False,
+        )
+    def step(self, action: InventoryAction) -> InventoryObservation:
+        self.current_day += 1
+        self.reward = 0.0  # reset reward each step
+        day_cost = 0.0
+        day_revenue = 0.0
+        # 1. tick event countdowns
+        for event_name in self.events:
+            if self.events[event_name] > 0:
+                self.events[event_name] -= 1
+        # 2. remove expired groceries
+        new_batches = []
+        expired_groceries_count = 0
+        for batch in self.inventory["groceries"]:
+            if batch[1] == 0:
+                expired_groceries_count += batch[0]
+                continue
+            else:
+                new_batches.append([batch[0], batch[1] - 1])
+        self.inventory["groceries"] = new_batches
+        self.reward -= 0.05 * expired_groceries_count
+        # 3. Handle incoming deliveries
+        remaining_deliveries = []
+        for delivery in self.deliveries:
+            for product, shipment in delivery.items():
+                qty, arrival_day = shipment
+                if arrival_day <= self.current_day:
+                    self.inventory[product].append([qty, SHELF_LIFE[product]])
+                else:
+                    remaining_deliveries.append(delivery)
+        self.deliveries = remaining_deliveries
+        # 4. process purchases
+        for product, qty in action.buy_quantities.items():
+            unit_cost = COST_PRICES[product] + SHIPPING_COST[action.delivery_method]
+            total_cost = qty * unit_cost
+            # capacity overage cost
+            current_qty = sum(b[0] for b in self.inventory[product])
+            overage = max(0, (current_qty + qty) - self.inventory_capacity[product])
+            extra_cost = overage * EXTRA_INVENTORY_COST[product]
+            total_cost += extra_cost
+            if total_cost > self.cash:
+                self.reward -= 0.5  # penalize for ordering what you can't afford
+                continue
+            self.cash -= total_cost
+            day_cost += total_cost
+            arrival_day = self.current_day + SHIPPING_DAYS[action.delivery_method]
+            self.deliveries.append({product: [qty, arrival_day]})
+        # 5. generate demand
+        demand = self._generate_demand()
+        # 6. sell products (fifo)
+        for product, demand_today in demand.items():
+            product_availability = sum(batch[0] for batch in self.inventory[product])
+            if demand_today > product_availability:
+                missed_sales = demand_today - product_availability
+                sold = product_availability
+                day_revenue += sold * BASE_PRICES[product]
+                self.inventory[product] = []
+                self.reward -= missed_sales * BASE_PRICES[product] * 0.001
+                self.reward += sold * BASE_PRICES[product] * 0.001
+            else:
+                day_revenue += demand_today * BASE_PRICES[product]
+                self.reward += demand_today * BASE_PRICES[product] * 0.001
+                new_batches = []
+                for batch in self.inventory[product]:
+                    if batch[0] < demand_today:
+                        demand_today = demand_today - batch[0]
+                    elif demand_today == 0:
+                        new_batches.append(batch)
+                    else:
+                        new_batches.append([batch[0] - demand_today, batch[1]])
+                        demand_today = 0
+                self.inventory[product] = new_batches
+        # 7. Liquidate some stock (FIFO, no revenue)
+        total_liquidation_loss = 0.0
+        for product, count in action.liquidate.items():
+            if product not in self.inventory or count <= 0:
+                continue
+            actually_removed = min(count, sum(b[0] for b in self.inventory[product]))
+            total_liquidation_loss += actually_removed * COST_PRICES[product]
+            remaining = count
+            new_batches = []
+            for batch in self.inventory[product]:
+                if remaining <= 0:
+                    new_batches.append(batch)
+                elif batch[0] <= remaining:
+                    remaining -= batch[0]
+                else:
+                    new_batches.append([batch[0] - remaining, batch[1]])
+                    remaining = 0
+            self.inventory[product] = new_batches
+        self.reward -= total_liquidation_loss * 0.001
+        # compute day profit
+        day_profit = day_revenue - day_cost
+        self.cash += day_revenue
+        self.total_profit += day_profit
+        # check done
+        done = self.current_day >= self.max_days
+        # update state
+        self._state = InventoryState(
+            episode_id = self._state.episode_id,
+            current_day = self.current_day,
+            cash = self.cash,
+            inventory = {p: sum(b[0] for b in self.inventory[p]) for p in self.inventory},
+        )
+        return InventoryObservation(
+            current_day = self.current_day,
+            total_cash = self.cash,
+            day_profit = day_profit,
+            total_profit = self.total_profit,
+            demand_today = demand,
+            updated_inventory = copy.deepcopy(self.inventory),
+            remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
+            updated_events = copy.deepcopy(self.events),
+            updated_deliveries = copy.deepcopy(self.deliveries),
+            reward = self.reward,
+            done = done,
+        )
+    def _generate_demand(self):
+        rng = random.Random(self.seed * 1000 + self.current_day)
+        demand = {}
+        for product, (lo, hi) in self.base_demand.items():
+            demand[product] = rng.randint(lo, hi)
+        # weekend boost
+        if self.current_day % 7 in (5, 6):
+            for product in demand:
+                demand[product] = int(demand[product] * WEEKEND_MULTIPLIER)
+        # active event multipliers
+        for event_name, days in self.events.items():
+            if days <= 0 and event_name in EVENT_EFFECTS:
+                for product, mult in EVENT_EFFECTS[event_name].items():
+                    demand[product] = int(demand[product] * mult)
+        return demand
+    @property
+    def state(self) -> InventoryState:
+        return self._state