Overview
The Data Pipeline Engineer owns the systems that manage all data moving through pocstock — from intake to processing to delivery. This role combines data engineering, AI tooling, and quality control to ensure data flows reliably and is structured correctly at every stage.
Success in this role means:
- Building systems that reliably move data from intake to delivery without breakdowns
- Maintaining high data quality and consistency across all stages
- Enabling scalable handling of large volumes of content and datasets
Role & Responsibilities
- Build and maintain systems that manage data intake, processing, and delivery
- Handle large volumes of image, video, and metadata across workflows
- Integrate AI tools for tagging, classification, and data enrichment
- Monitor systems and resolve issues quickly to prevent delays
- Ensure data is organized, structured, and consistent across all stages
- Support dataset packaging and delivery to customers
- Work with operations and project teams to improve workflows
- Identify inefficiencies and improve systems for scale
How We Measure Success
- Reliable, uninterrupted flow of data across systems
- High data quality and consistency
- Minimal errors or data loss
- Efficient handling of increasing data volume
- Reduced manual work through automation
- Strong coordination with internal and external teams
Requirements
- 2–5+ years of experience in data engineering, backend systems, or similar roles
- Strong experience with Python and development environments
- Experience working with AWS S3 or similar storage systems
- Familiarity with FTP/SFTP and handling large media files
- Understanding of datasets, data structures, and processing workflows
- Comfort using AI tools such as ChatGPT or Claude
- Experience working with image, video, or large datasets preferred
- Strong problem-solving and systems thinking skills