Instructions to use lenamerkli/ingredient-scanner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lenamerkli/ingredient-scanner with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lenamerkli/ingredient-scanner", filename="llm.Q4_K_M.gguf", )
llm.create_chat_completion( messages = "\"cats.jpg\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use lenamerkli/ingredient-scanner with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lenamerkli/ingredient-scanner:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf lenamerkli/ingredient-scanner:Q4_K_M
Use Docker
docker model run hf.co/lenamerkli/ingredient-scanner:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use lenamerkli/ingredient-scanner with Ollama:
ollama run hf.co/lenamerkli/ingredient-scanner:Q4_K_M
- Unsloth Studio new
How to use lenamerkli/ingredient-scanner with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lenamerkli/ingredient-scanner to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lenamerkli/ingredient-scanner to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lenamerkli/ingredient-scanner to start chatting
- Docker Model Runner
How to use lenamerkli/ingredient-scanner with Docker Model Runner:
docker model run hf.co/lenamerkli/ingredient-scanner:Q4_K_M
- Lemonade
How to use lenamerkli/ingredient-scanner with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lenamerkli/ingredient-scanner:Q4_K_M
Run and chat with the model
lemonade run user.ingredient-scanner-Q4_K_M
List all available models
lemonade list
Ingredient Scanner
Abstract
With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.
The training code and data is available on GitHub. This repository just contains an inference example and the report.
This is an entry for the 2024 Swiss AI competition.
Table of Contents
Report
Read the full report here.
Model Details
This repository consists of two models, one vision model and a large language model.
Vision Model
Custom convolutional neural network based on ResNet18. It detects the four corner points and the upper and lower limits of a product.
Language Model
Converts the text from the optical character recognition engine which lies in-between the two models to JSON. It is fine-tuned from unsloth/Qwen2-0.5B-Instruct-bnb-4bit.
Usage
Clone the repository and install the dependencies on any debian-based system:
git clone https://huggingface.co/lenamerkli/ingredient-scanner
cd ingredient-scanner
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
Note: not all requirements are needed for inference, as both training and inference requirements are listed.
Select the OCR engine in main.py by uncommenting one of the lines 20 to 22:
# ENGINE: list[str] = ['easyocr']
# ENGINE: list[str] = ['anthropic', 'claude-3-5-sonnet-20240620']
# ENGINE: list[str] = ['llama_cpp/v2/vision', 'qwen-vl-next_b2583']
Note: Qwen-VL-Next is not an official qwen model. This is only to protect business secrets of a private model.
Run the inference script:
python3 main.py
You will be asked to enter the file path to a PNG image.
Anthropic API
If you want to use the Anthropic API, create a .env file with the following content:
ANTHROPIC_API_KEY=YOUR_API_KEY
Citation
Here is how to cite this paper in the bibtex format:
@misc{merkli2024ingriedient-scanner,
title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
author={Lena Merkli and Sonja Merkli},
date={2024-07-16},
url={https://huggingface.co/lenamerkli/ingredient-scanner},
}
- Downloads last month
- 26
4-bit