7-10 NanoOWL developmen

Learning Objectives

NanoOWL optimizes the OWL-ViT model using TensorRT, enabling real-time performance on Nvidia Jetson Orin series platforms.

OWL-ViT is an open-vocabulary object detection (OVD) model capable of locating objects in an image based on user-defined prompts.

 

Initial environment setup

sudo apt update && sudo apt install git -y

git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
python3 setup.py develop --user

pip3 install packaging 
pip3 install transformers 
pip3 install onnx 
pip3 install numpy==1.26.4 
pip3 install matplotlib
pip3 install 'pillow>=10'

Install and run NanoOWL

// Download the NanoOWL source code and install it
git clone https://github.com/NVIDIA-AI-IOT/nanoowl
cd nanoowl
sed -i 's/image = np.asarray(image)$/image = np.asarray(image).copy()/g' nanoowl/owl_drawing.py
python3 setup.py develop --user

// Generate the visual encoder optimization file (*.engine)
mkdir -p data
python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine

// Run (the result will be output as a JPG file,data/owl_predict_out.jpg)
cd examples
python3 owl_predict.py \
    --prompt="[an owl, a glove]" \
    --threshold=0.1 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine


 

Copyright © 2026 YUAN High-Tech Development Co., Ltd.
All rights reserved.