8-5 Alibaba: Qwen2.5 Model
Learning Objectives
Download the Qwen 2.5 model from the Hugging Face platform using a Python program, and use simple prompts to ask questions and get answers from Qwen 2.5.

What is Qwen2.5?
Qwen 2.5 is a large language model developed by Alibaba. It can engage in conversations like a chatbot, as well as be used for writing articles or performing text analysis.
What Can Qwen2.5 Do?
1. Article Generation / Summarization: Automatically generate paragraphs based on a topic, or compress long texts into key summaries.
2. Conversational Interaction: Answer questions and discuss various topics like a chat assistant.
3. Text Analysis: Perform sentiment analysis or keyword extraction to help understand the content of articles.
How to Get Started?
1. The following example program will make Qwen 2.5 give a brief introduction to large language models.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Set model name
model_name = "Qwen/Qwen2.5-7B-Instruct-1M"
# Load the model, automatically selecting the appropriate Torch data type
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto", # Automatically choose data type (e.g., bfloat16 or float16)
device_map="auto", # Automatically select device (CPU or GPU)
cache_dir="./model", # Set path to store the model (default is ~/.cache/huggingface)
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./model")
# Set the conversation messages, including system and user messages
prompt = "Introduce large language models briefly."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
# Use tokenizer to apply chat template to the message list
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Convert the text to tensor and move it to the model's device
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate response using the model
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
# Remove the input portion from the output to get only the generated response
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
# Decode the output, skipping special tokens, and print the response
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
2. After running it, you will see a response similar to the following:
(1) The model has not been optimized, so the generation speed will be relatively slow and will require some waiting time.
(2) You can reduce max_new_tokens to decrease the number of output tokens.

Once completed, you can use Qwen 2.5 for Chinese writing, conversations, or text analysis! For example, by modifying the prompt in the program, you can have Qwen 2.5 summarize the key points of an article.
prompt = """Organize the highlights of the following article:
Founded more than 100 years ago,
National Taiwan University Hospital (NTUH) has nurtured countless medical professionals
and is known for its trusted clinical care.
The national teaching hospital is now adopting AI imaging technology to diagnose
patients more quickly and accurately.
Trained on more than 70,000 axial images of 200 patients, NTUH's HeaortaNet model at
automatically performs segmentation of cardiac computed tomography (CT) scans in 3D,
including the aorta and other arteries, to quickly analyze cardiovascular disease risk.
The model's segmentation of the pericardium and aorta is highly accurate, dramatically
reducing the data processing time from one hour to approximately 0.4 seconds per case.
In addition, NTUH, in collaboration with the Good Liver Foundation and system builder
Smartech, developed a diagnostic aid system,
, to detect liver cancer during ultrasound examinations.
The system utilizes NVIDIA's Jetson Orin NX module and a deep learning model trained
on more than 5,000 labeled ultrasound images,
to identify malignant and benign liver tumors in real-time.
NVIDIA DeepStream and TensorRT SDKs accelerate the system's deep learning models,
ultimately helping clinicians detect tumors earlier and more reliably.
In addition, NTU Hospital is using NVIDIA DGX to train AI models for the pancreatic
cancer detection system from CT scans.
"""
Reference:
Qwen/Qwen2.5-7B-Instruct-1M · Hugging Face
https://blogs.nvidia.com.tw/blog/taiwan-medical-centers-system-builder-partners/