About Us
Applications built on top of machine learning methods are taking over the world, but they also hit scaling walls due to resource costs and energy efficiency shortcomings. We’re building cutting-edge software infrastructure to reduce costs and power high-performance machine learning and AI applications at scale. You can be part of a team tackling low-level performance challenges, optimization problems, or data and model/application engineering—depending on what you find most exciting.
Work on One of the Following Focus Areas
Low-Level Building Blocks
- Measure device performance and optimize memory usage for AI workloads.
- Experiment with compiler optimizations, custom operators, or hardware-specific libraries.
- Co-optimize scheduling, communication, and compute to increase GPU utilization.
Inference Optimization
- Analyze and benchmark inference frameworks (e.g., vLLM, TensorRT).
- Identify bottlenecks in end-to-end pipelines and optimize to reduce latency and cost.
- Contribute to scheduling, load balancing, and parallelization strategies for multi-node environments.
Model & Data Engineering
- Collect, clean, and curate datasets to improve model quality for specific tasks.
- Enhance or modify tokenization strategies for better vocabulary coverage or faster inference.
- Fine-tune large language models (LLMs) and vision-language models (VLMs) on specialized datasets and realign them for domain-specific use cases.
Knowledge Processing & Understanding
- Build extraction + normalization pipelines to turn messy inputs into queryable records.
- Evaluate knowledge representations that support retrieval, grounding, and analytics.
- Turn complex questions into executable plans and explore the solution space of static vs adaptive strategies.
Who You Are
- Currently pursuing a degree (Bachelor’s or Master’s) in Computer Science, Electrical Engineering, or a related field.
- Good programming skills in Python and/or C/C++/Rust (depending on track).
- Familiar with ML fundamentals—whether through coursework, personal projects, or research. We’ll learn the more difficult topics together.
- Enthusiastic about learning: You enjoy diving into complex systems, reading documentation, and pushing the limits of hardware or models.
What You’ll Do
- Collaborate & learn: Pair with engineers and researchers to design experiments, implement features, and interpret performance data.
- Prototype & iterate: Rapidly build proof-of-concepts, benchmark results, and iterate toward production-quality solutions.
- Document & share: Maintain clear documentation of your findings, methods, and recommendations.
Nice-to-Have Interests (Not Required)
- Low-Level: CUDA/OpenCL experience, compiler knowledge.
- Inference Optimization: Familiarity with PyTorch, workload scheduling, or distributed training/inference.
- Model & Data Engineering: Experience with tokenizers, fine-tuning approaches , or data augmentation/curation pipelines.
- Knowledge Processing & Understanding: Familiarity with search & retrieval techniques, agentic frameworks, or knowledge representation and reasoning.
Why Join Us?
- Hands-on impact: Your work may go directly into production workflows.
- Mentorship & growth: Work alongside seasoned ML engineers and researchers, receiving dedicated mentorship.
- Cutting-edge projects: Work at the intersection of AI research and next-gen infrastructure.
- Long-term potential: Successful interns have opportunities to join us full-time.