8-11 gpt-oss model

Learning Objectives

Use the gpt-oss model via llama.cpp and obtain answers by sending simple prompts. Additionally, access the gpt-oss model through a WebUI.

What is gpt-oss？

gpt-oss is an open-source large reasoning model developed by OpenAI. It possesses the capability to both reason and execute complex challenges and agentic tasks. In other words, you can provide it with a problem or a command, and it will "think it through" to either deliver a solution or perform tool calls.

What Can gpt-oss Do?

1. Give it a complex problem, and it will derive a solution with rigorous logic.

2. Give it an agentic task, and it will autonomously call external tools to execute.

How to Get Started?

Install llama.cpp; the build process usually takes some time.

git clone https://github.com/ggml-org/llama.cpp.git

cd llama.cpp

mkdir build

cd build/

cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc

make -j$(nproc)

2. Download the quantized model.

cd ../../

wget \

-O models/gpt-oss-20b-Q4_K_S.gguf \

https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-Q4_K_S.gguf?download=true

3. Chat in the terminal.

./llama.cpp/build/bin/llama-cli \

-m models/gpt-oss-20b-Q4_K_S.gguf \

-ngl 40

4. Once executed, you will see the following output.

5. You can then start a conversation by entering your questions. For example, if you input "How many 'r's are in 'strawberry'?", it will produce the following output:

6. Install the required packages for chatting via the WebUI.

curl -LsSf https://astral.sh/uv/install.sh | sh

source ~/.bashrc

uv venv --python 3.11 --seed

uv pip install --no-cache-dir open-webui

7. After installing the packages, start the server in one of the terminal windows.

./llama.cpp/build/bin/llama-server \

-m models/gpt-oss-20b-Q4_K_S.gguf \

--host 0.0.0.0 \

-n 128 \

-ngl 999

8. Upon success, you will see the following screen.

9. Open the WebUI in another terminal (also within the Docker container).

uv run open-webui serve --host 0.0.0.0 --port 8081

10. Upon success, you will see the following screen.

11. Next, open your browser, navigate to https://127.0.0.1:8081, and register an account.

12. Click "Select a model" and then click "Manage Connections."

13. Click the plus (+) icon.

14. Enter the information and click "Save."

15. Next, select a model and enter your question to generate the following result.

Reference:

gpt-oss - a openai Collection

ggml-org/llama.cpp: LLM inference in C/C++