Senior Data Engineer

Digital Biology

Digital Biology

Data Science
Watertown, MA, USA
Posted on Tuesday, December 26, 2023

Digital Biology is building a precision measurement platform to accelerate the development of precision therapies. We focus on integrating technologies that enable mapping biological interactions in their native tissue context, and deploying these at the scale needed to understand and impact human disease. Our platform has broad applicability, from elucidating mechanism of action and biomarkers, to screening of genetically encoded systems at scale.

Digital Biology is a rapidly growing Seed stage company backed by top VCs and partnering with the world’s top pharmaceutical companies. Our interdisciplinary team of biologists and engineers are passionate about bringing the future of biological measurements to life; we are looking for someone to work alongside us to make that a reality.

Role overview

Digital Biology is growing rapidly, and we are looking for a senior-level data/backend engineer who is excited about contributing to cutting-edge biological research. Our core technology is related to spatial biology - we collect paired molecular sequencing and image data from intact biological tissues, and use these data to deliver key insights in areas like drug discovery, vaccine research, and cancer therapies.

In order to transform our raw data into actionable biological insights that we can deliver to partners, we rely on our Data Team to write, test, validate and deploy complex data ingestion, analysis, and reporting pipelines. You would join a team of experienced Software Engineers, Biologists, and Mathematicians dedicated not just to building out new data analysis capabilities, but also to developing the data governance strategy that will guide the types of analyses that are possible in the future. In this role, you will develop data warehousing solutions and design new analytical query platforms that will unblock analysis of never-before seen types of biological data containing both rich “spatial” mapping information as well as sequence-based genetic information.

In addition to our core data products, our Data Team is also responsible for developing several real-time analytical tools for use in the lab. Because of the dynamic nature of the types of experiments we specialize in, these internal tools are not just “fire-and-forget” data pipelines attached to real-time dashboards. Instead, we aim to provide our computationally-savvy team of lab scientists with a flexible programming interface to allow them to write code that helps them optimize their workflows on a per-experiment basis. We are looking for highly collaborative individuals who would be excited to partner closely with our Scientists to help develop high-impact tooling that can interface directly with our existing LIMS (lab information management system).

Core Responsibilities

  • Develop, test, and maintain data warehousing solutions to ingest imaging, sequencing, and other complex data types.
  • Design, deploy, and maintain custom query engines for analyzing spatially-resolved sequencing data.
  • Collaborate directly with scientists to build real-time, interactive data analysis pipelines for use during experiments.
  • Design, write, test and deploy code to extract raw data from the extremely heterogeneous hardware and software environments associated with our lab equipment.
  • Continuously improve the reliability and scalability of both the on-premises and cloud infrastructure backing our data storage and processing pipelines.
  • Help establish a culture of writing clean, efficient, and maintainable code, both leading by example and by providing mentorship to junior engineers in code review.

Additional Responsibilities

Depending on your background and experience, additional projects may be available, including:

  • Implement and optimize new, multi-modal statistical analyses.
  • Develop tooling for real-time data visualization, both internally and client-facing.

Required Qualifications

  • MS in Computer Science or related field, or comparable industry experience.
  • Strong software engineering experience, with a thorough grasp of IT concepts, design and development tools, system architecture and technical standards, shared software concepts.
  • Experience designing, deploying, and managing relational database systems.
  • Experience writing best-effort data pipelines, establishing data governance plans, and enforcing constraints on storage and compute utilization.
  • Proven understanding of database modeling concepts: entities/tables, relations/constraints, attribute data types, and column data times, with proficiency in SQL.
  • Experience with both cloud and on-premises (especially real time) data processing pipelines.
  • Fluency in Python and SQL.
  • Fluency with Git and DevOps best-practices.

Preferred Qualifications

  • PhD in a quantitative field.
  • Experience with both AI-based and “traditional” image analysis techniques.
  • Molecular biology research experience is highly desirable.
  • Experience dealing with bioinformatics data is a plus.
  • Experience with spatial databases is a plus.

Company Benefits: Health, vision, and dental insurance + 401K plan

If you don't meet all of the requirements listed here, we still encourage you to apply or reach out to us. No job description is perfect – we may find an even more suitable opportunity that is a better fit for you.