Truthset, Inc. Data Engineer Oakland, CA · Full time

Senior Data Engineer looking to join a growing startup!

About Truthset, Inc.

Truthset is a dynamic and innovative data intelligence company that specializes in validating the accuracy of the world's consumer data to empower data-driven decision-making and marketing success.

Description


Job Title: Data Engineer


Location: San Francisco Bay Area / Remote US


Who We Are:  Truthset is a venture-backed SaaS startup solving the multi-billion dollar problem of data quality for the entire marketing industry. Our platform enables brands and publishers such as Paramount, Procter & Gamble, and Transunion to optimize consumer data quality, improving marketing ROI. In a fast-paced and collaborative environment, we are committed to excellence and innovation.


Our Tech Stack:   AWS (EMR, EC2, S3, Athena, Sagemaker), Spark, DBT, Snowflake, Databricks, Airflow, Terraform, Github, Tableau. 


Our Programming Languages: Scala, Python, SQL, and Bash. 


Who You Are:

A driven individual excited about joining a small, but growing Data Science and Engineering team. You’ll report to the Head of Data Science and work alongside a Data Scientist and a Principal ML Engineer. You have a deep understanding of data engineering principles and past work experience designing, building, and maintaining data pipelines in cloud environments. 


Responsibilities:

  • Design, build,  and maintain scalable data pipelines that supply big data to internal and external teams.
  • Automate the delivery of terabytes of structured data to a growing group of enterprise clients.
  • Automate the ingestion of terabytes of external data sources into internal data warehouses in different environments (e.g., AWS, Snowflake, Databricks). 
  • Write, test, debug, and optimize custom Scala code for ETL workflows and other one-off tasks. 
  • Deploy ETL code in the cloud (using batch orchestration tools, like Airflow).
  • Work closely with the Head of Data Science and Principal ML Engineer to test and deploy new infrastructure for data processing.
  • Create an internal toolkit (KPIs, testing programs, dashboards) to monitor the health of data pipelines.
  • Maintain documentation about generated datasets (data dictionaries, feed specs. etc.) for internal and external use. 
  • Advise the Head of Data Science on future tooling upgrades



Core Qualifications:      

  • Bachelor's in Computer Science, Mathematics, Statistics, or other related fields.
  • 3+ years of relevant work experience. 
  • Proficiency in one or more programming languages such as Python, Scala, Java, or other languages commonly used in data engineering
  • Experience with cloud/distributed computing tools, including Spark, AWS EMR, and cloud-based data warehouse platforms such as Snowflake, Databricks or Redshift.
  • A strong background in at least one of the following: distributed data processing or software engineering of data services, or data modeling 
  • Experience with relational (SQL) databases and graph databases
  • Experience with version control software, such as Github.
  • Excellent communication and collaboration skills.
  • Strong problem-solving skills and attention to detail.



Ideal Qualifications:

  • Industry experience programming in Scala.
  • Familiarity with a scripting language like Python or R. 
  • Familiarity with Terraform and Airflow.
  • Familiarity with DBT



Compensation:

The compensation package will include full health benefits, 401k, and the potential for an equity stake.   


Contact: 

To apply, please email a CV and (optional) cover letter to [email protected]


Salary

$150,000 - $180,000 per year