Instructions to use lenamerkli/ingredient-scanner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lenamerkli/ingredient-scanner with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lenamerkli/ingredient-scanner",
	filename="llm.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "\"cats.jpg\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use lenamerkli/ingredient-scanner with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M

Use Docker

docker model run hf.co/lenamerkli/ingredient-scanner:Q4_K_M

LM Studio
Jan
Ollama
How to use lenamerkli/ingredient-scanner with Ollama:
```
ollama run hf.co/lenamerkli/ingredient-scanner:Q4_K_M
```

Unsloth Studio new

How to use lenamerkli/ingredient-scanner with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lenamerkli/ingredient-scanner to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lenamerkli/ingredient-scanner to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lenamerkli/ingredient-scanner to start chatting

Docker Model Runner
How to use lenamerkli/ingredient-scanner with Docker Model Runner:
```
docker model run hf.co/lenamerkli/ingredient-scanner:Q4_K_M
```

Lemonade

How to use lenamerkli/ingredient-scanner with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lenamerkli/ingredient-scanner:Q4_K_M

Run and chat with the model

lemonade run user.ingredient-scanner-Q4_K_M

List all available models

lemonade list

Ingredient Scanner

Abstract

With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.

The training code and data is available on GitHub. This repository just contains an inference example and the report.

This is an entry for the 2024 Swiss AI competition.

Abstract
Report
Model Details
Usage
Citation

Report

Read the full report here.

Model Details

This repository consists of two models, one vision model and a large language model.

Vision Model

Custom convolutional neural network based on ResNet18. It detects the four corner points and the upper and lower limits of a product.

Language Model

Converts the text from the optical character recognition engine which lies in-between the two models to JSON. It is fine-tuned from unsloth/Qwen2-0.5B-Instruct-bnb-4bit.

Usage

Clone the repository and install the dependencies on any debian-based system:

git clone https://huggingface.co/lenamerkli/ingredient-scanner
cd ingredient-scanner
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Note: not all requirements are needed for inference, as both training and inference requirements are listed.

Select the OCR engine in main.py by uncommenting one of the lines 20 to 22:

# ENGINE: list[str] = ['easyocr']
# ENGINE: list[str] = ['anthropic', 'claude-3-5-sonnet-20240620']
# ENGINE: list[str] = ['llama_cpp/v2/vision', 'qwen-vl-next_b2583']

Note: Qwen-VL-Next is not an official qwen model. This is only to protect business secrets of a private model.

Run the inference script:

python3 main.py

You will be asked to enter the file path to a PNG image.

Anthropic API

If you want to use the Anthropic API, create a .env file with the following content:

ANTHROPIC_API_KEY=YOUR_API_KEY

Citation

Here is how to cite this paper in the bibtex format:

@misc{merkli2024ingriedient-scanner,
    title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
    author={Lena Merkli and Sonja Merkli},
    date={2024-07-16},
    url={https://huggingface.co/lenamerkli/ingredient-scanner},
}

Downloads last month: 26

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

4-bit

lenamerkli
/

ingredient-scanner