8-3 Meta AI: Llama3 model

Learning Objectives

Use a Python program to download the Llama 3 model from the Hugging Face platform, and ask questions to Llama 3 using simple prompts to get answers.

What is Llama 3?

Llama 3 is a large language model developed by Meta, functioning like an AI brain that can understand text, write articles, and answer questions. It has read a vast amount of text from the internet and learned how to converse naturally and fluently with people.

What Can Llama 3 Do?

1. Write essays / complete sentences: Input a partial text, and the AI will automatically complete the rest.

2. Answer questions: You can ask about history, science, math, or everyday life topics.

3. Generate code examples: It can provide examples for simple programming syntax.

How to Get Started?

1. Go to the Hugging Face Llama 3 page (https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6) and select meta-llama/Meta-Llama-3-8B-Instruct.

2. Fill in the required information to apply for access and wait for approval.

3. After completing the form, you can check the processing status in the settings.

4. Upon success, you will see the following screen.

5. The following example code lets Llama 3 generate an answer to the prompt "What is a cat? Describe in one sentence.”

(1) Replace token with the Hugging Face token you generated earlier.

import transformers
import torch
from huggingface_hub import login

# Log in to Hugging Face
login(token="hf_XXXXXXXXXXXXXXXX") # Replace with your own Hugging Face token

# Model name
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

# Create a pipeline for text generation
pipeline = transformers.pipeline(
    "text-generation",  # Specify the task type as text generation
    model=model_id,  # Specify the model ID
    model_kwargs={
        "torch_dtype": torch.bfloat16, # Set the data type of the model (bfloat16)
        "cache_dir": "./model" # Set the path to store the model (default is ~/.cache/huggingface)
    },
    device_map="auto",  # Automatically select device (CPU or GPU)
)

# Set up the conversation messages, with roles and content
messages = [
    {"role": "user", "content": "What is a cat? Describe in one sentence."},  # User asks a question
]

# Define end-of-sequence tokens to identify where generation should stop
terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

# Use the pipeline to generate text
outputs = pipeline(
    messages,  # Input messages
    max_new_tokens=256,  # Maximum number of new tokens to generate
    eos_token_id=terminators,  # List of end-of-sequence token IDs
    do_sample=True,  # Enable random sampling for generation
    temperature=0.6,  # Controls randomness of generation
    top_p=0.9,  # Only consider tokens with cumulative probability up to 0.9
)

# Print the result
print(f'result: {outputs[0]["generated_text"][-1]["content"]}')

 

6. After execution, the system will download and load the model. Since the model has not been optimized, the generation speed will be relatively slow and will require some waiting time.

7. fter running it, you will see a response similar to the following:

Once completed, you can modify the input question in the program and use Llama 3 to assist with writing, problem-solving, or practicing programming!

For example, asking questions like:

1. What are Newton’s three laws of motion?

2. In which year did the Opium War take place?

3. Translate “an apple a day keeps the doctor away” into Chinese, and so on.

Reference:

meta-llama/Meta-Llama-3-8B-Instruct · Hugging Face

 

 

Copyright © 2026 YUAN High-Tech Development Co., Ltd.
All rights reserved.