7-10 NanoOWL の開発

学習目標

NanoOWLはTensorRTを用いてOWL-ViTモデルを最適化し、NVIDIA Jetson Orinシリーズプラットフォーム上でリアルタイム性能を実現します。

OWL-ViTは、ユーザーが定義したプロンプトに基づいて画像内の物体を検出できるオープンボキャブラリー物体検出（OVD）モデルです。

初期環境設定

sudo apt update && sudo apt install git -y
git clone https://github.com/NVIDIA-AI-IOT/torch2trt

cd torch2trt

python3 setup.py develop --user
pip3 install packaging 

pip3 install transformers 

pip3 install onnx 

pip3 install numpy==1.26.4 

pip3 install matplotlib

pip3 install 'pillow>=10'

NanoOWLをインストールして実行する

// Download the NanoOWL source code and install it

git clone https://github.com/NVIDIA-AI-IOT/nanoowl

cd nanoowl

sed -i 's/image = np.asarray(image)$/image = np.asarray(image).copy()/g' nanoowl/owl_drawing.py

python3 setup.py develop --user
// Generate the visual encoder optimization file (*.engine)

mkdir -p data

python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine
// Run (the result will be output as a JPG file，data/owl_predict_out.jpg)

cd examples

python3 owl_predict.py \

    --prompt="[an owl, a glove]" \

    --threshold=0.1 \

    --image_encoder_engine=../data/owl_image_encoder_patch32.engine