9-1 NexVDO SDK - QCAP Feature Overview and Architecture
In this first chapter, we will start from a macro perspective of the SDK's block diagram. This comprehensive audio/visual SDK covers four core areas: **Capture, Record, Stream, and Analysis**. However, before diving into these four major blocks, if you closely observe the top and bottom of the block diagram, you will find that this SDK provides ultimate developer-friendliness and hardware acceleration support to meet the needs of different developers:
- Cross-language and framework support: Looking at the very top of the block diagram, you can see that the SDK fully supports C/C++, Python, and QT frameworks. This means whether you are a C/C++ developer pursuing ultimate low-level performance, a Python developer focusing on AI models and rapid validation, or a QT engineer needing to build a professional Graphical User Interface (GUI), you can seamlessly integrate using your most familiar language.
- Powerful hardware platform and low-level acceleration: Shifting your gaze to the very bottom of the block diagram, you will find that the entire architecture is deeply rooted in the powerful NVIDIA® Jetson Orin™ Platform. The SDK deeply integrates NVIDIA's core acceleration technologies, including GPUDirect/RDMA, NVENC/NVDEC, and CUDA computing, and even extends to the AI end with TensorRT and TensorRT-LLM. This ensures that performance is optimally utilized in every step, from video input to AI intelligent analysis.
Official Reference Information: If you want a sneak peek at the complete specifications and product details of this SDK, please feel free to visit the YUAN Official Product Page to learn more! In future chapters, we will successively provide technical breakdowns on these topics: "Capture, Record, Stream, and Analysis," and bring in practical analysis applications. But in this chapter, let's first take a quick preview of the three most fundamental and critical functional blocks in the diagram: Capture, Record, and Stream.

Capture
Video capture acts as the "eyes" of a system, responsible for efficiently and losslessly transmitting real-world scenes to a computer. Whether in medical surgeries, media broadcasting, or security monitoring, the first and foremost step is always to ensure that video information can be captured completely and with low latency.
This sounds straightforward, but it is often fraught with difficulties during actual development. As a developer, you may face challenges with diverse video input sources and must familiarize yourself with complex video development frameworks under different operating systems (such as DirectShow and MediaFoundation on Windows, or V4L2 and GStreamer on Linux).
To resolve these pain points, YUAN, leveraging 30 years of professional experience, introduced the NexVDO SDK, making "Capture" simpler and more powerful than ever before. Now, let's take a deep dive into the "Capture" block on the far left of the block diagram!
1. Breaking Down Framework Barriers: Handle Everything with Just 4 APIs
According to the underlying design of the block diagram, our SDK supports a highly comprehensive range of **2D/3D video, audio, and VANC capture** sources. Whether you are connecting to a physical capture card, a sound card, a USB camera, or even capturing a virtual desktop or an IP stream, it can handle them all seamlessly.
Even better, developers **only need to call 4 simple APIs to build video and audio capture functions in just 5 minutes**! The SDK also features a built-in "Auto Signal Detection" mechanism; whether a device is plugged in, unplugged, encounters no signal, or undergoes a format change, the system can automatically identify and process it.
2. APIsBreaking the Speed Limit: Empowered by NVIDIA GPUDirect and Exclusive Patents
Different industries have extremely stringent requirements for latency. In the medical field, for instance, latency must be kept below 50 milliseconds to avoid risks during critical operations.
To achieve ultimate low latency, the SDK is deeply integrated with NVIDIA GPUDirect technology. This technology allows high-quality video to bypass unnecessary transmission paths and be captured directly into the GPU memory, significantly saving memory transfer performance costs and latency. Additionally, coupled with the "Synchronous Capture Patent" and "Multiview Capture Patent" (Surround & Multiview) mentioned in the block diagram, developing multi-channel and multi-screen capture becomes much more effortless.
3. High-Performance Pre-processing and Rendering: The First Magic of Video Entering the System
After the video comes in, it often requires various pre-processing steps. NexVDO SDK features a built-in, powerful, independent CPU/GPU acceleration engine that provides a rich set of video processing capabilities:
- Video Optimization and Deinterlacing: Supports multiple advanced "deinterlace algorithms" (such as Motion Adapter, Blending, Filter Triangle, etc.) and allows for real-time adjustment of video brightness, hue, contrast, and saturation.
- Flexible Cropping and Overlay: Through the video cropping and scaling engine, you can freely adjust the video size. It also allows you to add real-time text, scrolling text, and pictures to the video, and even perform professional green screen removal (ChromaKey).
- High-Performance Rendering Engine (ThumbDraw): To cope with the common multi-screen display needs in security monitoring, the exclusive ThumbDraw technology can render video from a single device onto multiple screens, or render multiple videos simultaneously onto a single screen. It further supports high-speed display, mirror, region display, as well as advanced 3D display and HDR display.
4. Medical-Grade Snapshot and Recording
In the "Image Snapshot" function of the block diagram, the SDK not only supports continuous snapshot and flexible cropping but can also output in lossless or lossy compression formats such as BMP, JPG, PNG, and TIF. Most notably, for professional medical applications, the SDK perfectly supports DICOM and the latest HL7 protocols, enabling seamless integration of captured and recorded data into PACS and Medical Worklist servers.
Having mastered the perfect video source, what is the next step to properly preserve these precious visuals? In the following pages, we will take you to the right side of the block diagram to explore the equally feature-packed "Record" block.
Record
With a stable capture source in place, the block diagram now moves to the "Record" node. This SDK's definition of recording is by no means just "writing data to a hard drive." It focuses on "High-Performance Encoding and Flexible Recording Design." Let's compare it with the block diagram to see what powerful weapons it has prepared for developers!
1. Format Freedom and Ultimate Hardware Encoding Performance
As a developer, dealing with a variety of audio and video container formats is often the biggest headache. However, our SDK covers almost all mainstream formats on the market, including AVI, MP4, ASF, WMV, MOV, FLV, TS, M3U8, and WAV. Furthermore, for the healthcare industry, it natively supports professional formats such as DICOM / HL7 / WL, and thoughtfully includes practical functions like file repair, automatic thumbnail generation, subtitle embedding, and custom data insertion.
In terms of encoding formats, it supports RAW, MPEG2, H.264, H.265, and even perfectly supports the next-generation highly efficient AV1 and AAC audio. Most importantly, it can fully leverage underlying hardware acceleration technologies. Whether you are using Intel® Media SDK, NVIDIA® CUDA/NVENC™, or AMD® VCE™, the SDK empowers the GPU to handle the heavy lifting of encoding, completely freeing up CPU performance!
2. Professional-Grade "Multi-Stream" Recording and Audio Mixing
In broadcast television or large-scale surveillance projects, we often need to simultaneously record videos with different resolutions or from different viewing angles. As seen in the block diagram, the SDK specifically provides advanced recording modes:
- Multi-Stream Channel Recording and Multi-Stream Director Recording: Allowing you to easily build a professional director/switcher system while storing multiple signal sources simultaneously.
- Multi-Stream 3D Recording: Providing the most direct support for the preservation of stereoscopic 3D video.
- Multi-Audio Track and Audio Mixing Recording: With video in place, audio certainly cannot be left behind. The SDK allows developers to mix different audio sources or record them on separate tracks, achieving a perfect fusion of audio and video.
3. The "Time Machine" and Security Mechanisms to Never Miss a Critical Moment
This is what I personally consider the coolest feature in the recording block, and also the most indispensable function in security and mission-critical tasks!
- Pre-event Recording and Time-Shift Recording: In a surveillance system, it is often too late to start recording when an alarm is triggered (e.g., a break-in). The SDK features built-in, powerful "Pre-event Recording," "Time-Shift Recording," and "Rewind Playback" mechanisms. It can encapsulate and record the footage from seconds (or even longer) before the event occurred, truly ensuring nothing is missed!
- Military-Grade Protection: For highly confidential video (such as technology enforcement or medical surgery records), the SDK provides an "Encrypted Recording" function. At the same time, to prevent hard drive damage from ruining your hard work, it also has built-in "Synchronous Recording" and "Duplicate Recording (Backup Recording)" for double or even triple protection of your data security.
Seeing this, don't you also think that the "Record" function of this SDK is far more powerful than imagined? It not only boasts fast encoding speeds and broad format support but also prepares broadcast-grade director requirements and stringent security mechanisms for developers in advance. After perfectly capturing, packaging, and recording the video, the next step is to break the limitations of space and transmit these images to the world in real-time! In the next article, we will continue moving to the right side of the block diagram to reveal the third puzzle piece—the "Stream" module, which focuses on ultra-low latency and full protocol support. Stay tuned!
Stream
Welcome to the third part of our SDK block diagram analysis! In the previous article, we learned how to securely encapsulate and preserve high-quality video through the "Record" function. But the value of video shouldn't just stay on a local hard drive, right? Today, we will turn our attention to the third core block of the diagram—"Stream"—to see how this SDK breaks the barriers of devices and space, pushing audio and video footage to the world in real-time.
To build a low-latency, high-quality live streaming application, the biggest pain point is often the need to deeply understand various complex underlying network protocols. But don't worry, with this NexVDO SDK, you will gain unprecedented development freedom! Let's look at the "Stream" block in the diagram and decode its powerful network transmission firepower one by one.
1. Total Dominance! Tailor-Made Communication Protocols for All Industries
As seen in the block diagram, our SDK supports almost all mainstream streaming protocols on the market and perfectly addresses the specific needs of different industries:
- Live Broadcasting and New Media (RTMP / SRT / HLS): Built-in QoS encoding control ensures that whether you are developing for esports live streaming or influencer platforms, you can stably push footage to major CDN servers via RTMP, or utilize the HTTP-based HLS protocol to allow seamless pull and viewing across all cross-platform devices (such as mobile phones and tablets).
- Security Surveillance (ONVIF / RTSP): The SDK not only supports RTSP but also features a built-in ONVIF server, client, and enumerator certified by the official association. This means that by applying this SDK, your software or hardware can easily communicate with IP cameras and CMS surveillance systems from major brands on the market.
- Video Conferencing and Online Education (WebRTC / SIP): Perfectly covers WebRTC and SIP protocols, allowing developers to implement P2P real-time audio/video conversations and chat room functions in web browsers without forcing users to install additional plugins, effortlessly solving annoying firewall traversal issues.
- Professional Broadcast Television (TS / NDI): For the broadcast field that requires extremely high transmission stability, the SDK supports MPEG2-TS streaming and broadcast-grade NDI (including lossless Full NDI and low-bandwidth NDI®|HX). This enables cross-regional collaboration and multi-signal director/switcher functions to easily achieve bidirectional transmission in a 1G network environment. Additionally, the block diagram shows support for the professional audio network Dante AV-H.
2. Pursuing Ultimate Ultra-Low Latency and Powerful Server Architecture
Beyond supporting a wide range of protocols, transmission speed is the soul of streaming applications. From the details in the block diagram, you can find that our RTSP protocol boasts an astonishing 4-millisecond (4ms) ultra-low latency capability! Furthermore, for stringent applications like "remote medical surgery" that cannot tolerate a single moment of stuttering, we have developed the exclusive SkyLink X technology, which can achieve sub-millisecond latency over the internet with stable bandwidth, making it no longer a sci-fi scenario for doctors to remotely and precisely operate robotic arms.
In terms of system architecture design, the SDK provides powerful 2D/3D Streaming Clients and Multi-Stream Streaming Servers:
- Streaming Client: Supports snapshot, recording, display, and even direct PTZ (Pan-Tilt-Zoom) camera control.
- Multi-Stream Streaming Server: Supports delayed live broadcasting, transmission encryption, and audio mixing. It also provides developers with diversified transmission methods such as UDP, TCP, HTTP, Multicast, and RAW-UDP, ensuring that video can be smoothly delivered under any demanding network environment.
Ready to Dive into Practice?
Through "Capture", "Record", and "Stream", we have fully unlocked the three core audio and video processing foundations of this SDK. Having understood the macro architecture and powerful underlying technologies, it's time to get our hands dirty with code and turn these features into actual applications!
Since our SDK provides perfect support for the QT framework, it means we can build cross-platform and aesthetic Graphical User Interfaces (GUI) with extreme efficiency. Therefore, in the upcoming Chapter 10-2, we will take a brief pause from the theory lessons and dive straight into the practical phase that developers care about the most—guiding you step-by-step from scratch through the Installation and Setup of the QT Development Environment.
Get your keyboards ready, and we'll see you in the next hands-on chapter!