Job Description
The Data platform is currently based on a combination of Python, PostgreSQL, dbt, Dagster, and cloud services (AWS & GCP). You'll have the opportunity to expand and transform these services to support our ambitious growth plan.
You'll play a crucial role in the tools and processes that allow us to:
Collect large datasets continuously from various sources, filter, sort, process, store, and redirect data into our training pipelines, R&D experiments, and analytics solutions. Importantly, we expect to leverage AI-agent pipelines to ingest messy data locked in documents and images.
Support data access to our R&D team by contributing to our ETL processes (APIs, dbt, PostgreSQL) and our core data-access library in python: pnx.
Expand our data monitoring and data-quality control using pipelines, models, dashboards, alerts, tracing products, etc.
Efficiently train new model, evaluate them, release them.
Serve frequen...
Apply for This Position
Ready to take the next step? Click the button below to submit your application.
Submit Application