Engineer - AI Inference Performance Job at Huawei Technologies Canada Co., Ltd., Waterloo, ON

SEpPSUEyNVUzWjBxeVV2M1d3SHdrNXlI
  • Huawei Technologies Canada Co., Ltd.
  • Waterloo, ON

Job Description

Huawei Canada has an immediate 12-month contract opening for an Engineer.

About the team:

The Intelligent Complex Systems Team, currently a part of the Waterloo Research Centre, examines recent advancements in artificial intelligence (AI) and robotics to determine its potential for broader applications. This innovative team researches AI challenges such as matching human capabilities and ensuring the safety of collaborative AI systems.

About the job:
  • Develop and maintain real-time and historical performance monitoring tools for AI inference workloads, including profiling tools for various AI model types (small models, LLMs, VLMs, and multimodal systems) in applications like conversational AI, video processing, and real-time analytics.
  • Analyze and classify inference workloads based on characteristics like profile, decode, pre/post-processing overheads, and computational complexity to develop tailored optimization strategies.
  • Develop performance models that consider the systematic factors of AI inference, including model size, architecture (e.g., transformers, CNNs), application-specific constraints (e.g., latency for conversational AI), and compute resource characteristics (GPU, TPU, CPU, and specialized accelerators).
  • Optimize inference workloads across various hardware resources by reducing latency, minimizing memory overhead, and improving throughput. Techniques include quantization, pruning, fusion, and caching. Ensure that models can scale efficiently across diverse compute platforms, from edge devices to large-scale cloud infrastructures.
  • Lead efforts in creating benchmarks for different types of inference tasks. Utilize tools such as NVIDIA Nsight, PyTorch Profiler, and TensorBoard to gain insights into inference performance across diverse hardware platforms.
  • Conduct benchmarking and performance comparisons across various hardware platforms (e.g., GPUs, TPUs, edge accelerators) to identify bottlenecks and optimization opportunities. Provide recommendations for software and hardware improvements based on inference throughput, latency, and power consumption.
  • Work closely with AI research, software engineering, and DevOps teams to improve the end-to-end AI inference pipeline, ensuring optimized deployments across different production environments. Collaborate with system architects to incorporate resource-aware optimizations into design practices.
  • Develop strategies to ensure the scalability of inference workloads in production environments, considering both model performance and resource scaling, whether in on-premises environments, cloud infrastructure, or edge computing devices.

About the ideal candidate:

  • Ph.D. or Master’s degree in Computer Science, Electrical Engineering, Machine Learning, or related field.
  • Minimum 5+ years of experience in AI/ML engineering with a focus on inference performance, workload analysis, and system optimization.
  • Extensive experience with AI frameworks (e.g., TensorFlow, PyTorch, ONNX) and model optimizationtechniques (e.g., quantization, pruning, kernel fusion, and hardware-aware tuning).
  • Proficient with profiling tools (e.g., TensorBoard, PyTorch Profiler, NVIDIA Nsight) and workload analysis for diverse AI models and applications.
  • Expertise in optimizing small models, large language models (LLMs), VLMs, and multimodal models for inference.
  • Strong programming skills in Python, C++, CUDA, and experience with low-level hardware performance tuning.
  • Familiarity with performance modeling methodologies and frameworks for predicting inference workload performance under varying conditions.
  • Proven expertise in data parallelism, model parallelism, pipeline parallelism, and other distributed systems for performance improvements at scale.

Job Tags

Permanent employment, Full time, Immediate start,

Similar Jobs

DBA Raising the Bar

Crane Operator Job at DBA Raising the Bar

Crane Operator. DBA Raising the Bar. Location: Winfield, BC. Salary:34.62 hourly / 40 hours per week. Full time. Starts as soon aspossible...  ...objects and materials at railway yards. Operate bridges oroverhead cranes to lift, move, and place plant machinery andmaterials.... 

Centre Intégré Universitaire De Santé Et De Services Sociaux...

Préposé/préposée à l'entretien ménager - ae Job at Centre Intégré Universitaire De Santé Et De Services Sociaux...

Description de l'entreprise Le CIUSSS MCQ, cr au 1er avril 2015, est issu des 12 tablissements publics de sant et de services sociaux de cette rgion sociosanitaire. Il a la responsabilit dassurer une intgration des soins et services offerts la population de son...

Burke Recruiting Inc

Senior Accountant Job at Burke Recruiting Inc

Senior Accountant Hybrid | Not-for-Profit ]]

Avera Health

Pharmacy Technician- In- Training Job at Avera Health

 ...0 sign-on bonus! About the Pharmacy Departments: Inpatient Pharmacy...  ...: Becoming a Pharmacy Technician with Avera has given me the...  ...existed anywhere else that I have worked in my 21 year career. With...  ...: ability to work from home on a hybrid basis after completion... 

Brown University

Athletic Communications Intern | Brown University Job at Brown University

 ...Athletic Communications Intern Brown University To view the full job posting and apply for this position, go to: Job Description...  ...comprising undergraduate and graduate programs, plus the Alpert Medical School, School of Engineering, Executive Master of Healthcare...