8-4 Meta AI: Codellama model

Learning Objectives

Use a Python program to download the CodeLlama model from the Hugging Face platform, and ask Llama 3 questions using simple prompts to get answers.

What is CodeLlama?

CodeLlama is a programming language model developed by Meta AI specifically for code generation and understanding. It can comprehend your programming needs and automatically generate corresponding code snippets.

What Can Code Llama Do?

1. Generate code examples: Automatically produce executable code based on descriptions.

2. Code explanation: Explain complex functions or algorithm logic in simple language.

3. Debugging suggestions: Help identify errors in your code and provide directions for fixes.

How to Get Started?

1. Go to meta-llama/CodeLlama-7b-Instruct-hf (https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf) to request access and wait for approval. Once approved, you will see the following page.

2. The following example program will have CodeLlama help complete the twoSum function code.

(1) Please replace token with the Hugging Face token you generated earlier.

import torch

import transformers

from transformers import AutoTokenizer

from huggingface_hub import login
# Log in to Hugging Face

login(token="hf_XXXXXXXXXXXXXXXX") # Replace with your own Hugging Face token
# Model name

model = "meta-llama/CodeLlama-7b-hf"
# Load tokenizer from the model path

tokenizer = AutoTokenizer.from_pretrained(model)

# Create a pipeline for text generation

pipeline = transformers.pipeline(

    "text-generation",  # Specify task type as text generation

    model=model,  # Specify model ID

    torch_dtype=torch.float16,  # Specify model data type (float16)

    device_map="auto",   # Automatically select device (CPU or GPU)

    model_kwargs={"cache_dir": "./model" },  # Set path to store the model (default is ~/.cache/huggingface)

)
sequences = pipeline(

    "def twoSum(nums, target):",

    do_sample=True,  # Enable random sampling

    top_k=10,  # Sample from top 10 highest-probability tokens

    temperature=0.1,  # Controls randomness of generation

    top_p=0.95,  # Only consider tokens with cumulative probability up to 0.95

    num_return_sequences=1,  # Generate one sequence

    eos_token_id=tokenizer.eos_token_id,  # Specify end-of-sequence token

    max_length=200,  # Maximum length of generated sequence (in tokens)

)

for seq in sequences:

    print(f"Result: {seq['generated_text']}")

3. After running the program, you will see the following result

Once completed, you can modify the input prompts in the program and use CodeLlama to help solve coding problems like Palindrome Number, Roman to Integer, and so on.

Reference:

meta-llama/CodeLlama-7b-hf · Hugging Face