O'Reilly logo
live online training icon Live Online training

Fundamentals of Data Architecture

Bridging the Gap between Doer and Visionary

Ted Malaska

Most organizations have developed processes and practices for data management and development of large software projects. While many of these processes and practices are still relevant and valuable, the dramatic growth in volume and variety of data, along with new tools to manage this data, have caused these same organizations to struggle to adapt to this new landscape. This includes understanding how to evaluate new data management systems, how to properly staff projects to ensure success, and how to properly evaluate and manage risks when working with these new management systems.

Ted Malaska shares guidelines and practices to provide a path through the process of developing data projects from planning to implementation. You’ll leave with insights on managing and delivering your own successful data projects based on Ted’s years of experience working with multiple companies and customers.

Topics include:

  • Starting the planning process by understanding the key data project types
  • Selecting data management software in the new enterprise data space
  • Managing project risk, including technology risk, team risk, and requirements risk
  • Ensuring integrity of data through your entire data pipelines
  • Ensuring the integrity of data through effective data governance and management of data

What you'll learn-and how you can apply it

By the end of this online course, you'll understand:

  • Best practices for delivering successful data projects

And you'll be able to:

  • Break down a large-scale project into executable components
  • Build an effective strategy for evangelism
  • Earn the respect of follow engineers
  • Influence design and direction without having to force top-down decisions
  • Master the relationship with project management and product management

This training course is for you because...

  • Technical leads, architects, managers, CTOs, CDOs, CIOs, and developers working on data projects


  • Familiarity with data management concepts and systems such as relational databases
  • Experience building large software projects and knowledge of newer data management systems such as Hadoop or Cassandra (useful but not required)

About your instructor

  • Ted Malaska is the director of engineering for data streaming and persistence at Capital One. Previously, he was on the Battle.net team at Blizzard Entertainment, he was also a principal solutions architect at Cloudera, where he helped clients succeed with Hadoop and the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is the coauthor of Hadoop Application Architectures, a frequent conference speaker, and a blogger on data architectures.


The timeframes are only estimates and may vary according to how the class is progressing

  • To be a great engineer
  • Reviewing the 9 box review process
  • Types of architects: An introduction
  • Top-down vs bottom-up
  • Staying one step ahead
  • Looking for risk and issues
  • Data modeling (20 minutes)
  • Tricks to dig into an existing system
  • Dealing with performance problems
  • Finding a vision
  • How to make a powerpoint that is remembered