NexQloud Knowledge Base

Discover tailored support solutions designed to help you succeed with NexQloud, no matter your question or challenge.

TOPICS

The decision of where to run inference is a critical architectural choice, balancing latency, cost, and privacy.

Cloud Inference (on Nexqloud)
‍Inference runs on powerful, scalable servers in our data centers.

Modes:
- Real-time (Online) Inference: For user-facing applications requiring immediate feedback (e.g., chatbots, fraud detection). Demands low latency.
- Batch Inference: For processing large volumes of data at once when immediate results aren't needed (e.g., daily sales forecasting, analyzing overnight log files). Highly cost-effective.
Ideal For: Complex models, massive scale, and applications where data can be securely sent to the cloud.‍
Nexqloud Advantage: Access to high-performance AI accelerators (GPUs/TPUs), automatic scaling, and seamless integration with our data and analytics services.

Edge Inference
‍Inference runs directly on a local device (e.g., a smartphone, camera, or IoT sensor).

Ideal For: Applications where low latency, data privacy, or offline operation is non-negotiable.‍
Key Benefits:
- Near-Zero Latency: Essential for autonomous vehicles or real-time industrial control.‍
- Enhanced Privacy: Sensitive data (e.g., medical images) never leaves the device.‍
- Offline Operation: Functions without a constant internet connection.‍
- Reduced Bandwidth Costs: Only results or alerts are sent to the cloud, not raw data.