Daniel LinstedtBuilding a Scalable Data Warehouse with Data Vault 2.0
D**N
Outstanding coverage of a Revolutionary Data Architecture
A wonderful Data Vault book! Mr. Dan Linstedt has found an excellent co-author in Michael Olschimke. I read this one cover to cover, and I highly recommend it to Data Vaulters old and new.The fact that this book includes detailed implementation guidance for Data Vault via the Microsoft BI stack should not discourage non-Microsoft industry people from reading it. Here’s why: As Data Vault has matured and evolved as a methodology, no book besides this one has covered the state of the art in a way that combines such narrative clarity and technical depth. Also, Microsoft is a fine BI platform.Moreover, the coverage of DV 2.0 method details, with the transition to MD5 Hash Keys (vs. auto-numbers) is well documented right down to the level of code samples. The ample ETL code samples are also of enormous benefit. With them, an ETL engineer can pretty quickly appreciate how to not only load the Data Vault (pretty easy), but also how to efficiently pull data out of it (harder, admittedly) and load downstream layers with Business Rules and/or fact and dimension tables or views.For a moment, back to the Microsoft-specific implementation sections of this book, it’s exciting to see this content, and it reminds me of when The Kimball Group published “ The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset ”, in which they so aptly described the implementation of their compelling message for dimensional modeling in the Microsoft BI platform of that time. Soon afterwards, Microsoft teams were running fast with dimensional models, and the rest is history. I hopes that this excellent book will help to create similar traction for Data Vault. When armies of Microsoft implementers are using a given method, it has indeed hit the mainstream.Underneath Data Vault’s time-tested methods for handling real-world data ugliness with, oh yes, enforced referential integrity and without breaking, providing logical interoperability between increasingly disparate source data, and the need for fast, easily parallel loading, lies an elegant, wonderfully simple set of design patterns that revolutionize the speed and flexibility with which enterprises can build and support sustainable data integration in our new world of gigantic, oddly structured Big Data and NOSQL, and the seemingly unquenchable demand for analytic insight.One minor reservation: Although the Microsoft BI Stack is broad and strong, I don't personally regard their Data Quality Services (DQS), to be a particularly useful tool, notwithstanding the interesting implementation coverage it gets in this book. Still, a minor issue, and not directly related to Data Vault architecture anyway.Data solution architects and BI developers who do not understand Data Vault are, in my view, missing out on a compelling architectural choice for agile RDBMS data integration. For readers ready to take it to the next level, especially insofar as you must tolerate the initial discomfort over the proliferation of tables (> 2x your source tables), this book will assuredly take you a long ways down the road, and you will be rewarded, even astonished, at the flexibility and sustainability that Data Vault affords you once it gets into your blood.
S**.
Great concepts, but the book is loaded with errors
I had high hopes for this book and read it from cover to cover. Data Vault 2.0 offers valuable benefits for abstracting your source data from your presentation layer via an extensible, schema-agnostic middle layer in a 3-tier data warehouse architecture. It serves as the middle normalization layer in a traditional Inmon design, and allows you to easily integrate new data without redesigning what you’ve already built. Data Vault also makes it easy to tie unstructured data (e.g. Hadoop files) into your EDW environment via hashed business key values, for use by data scientists and other advanced users.The book extensively covers the differences between the Raw Vault, Business Vault, and Information Delivery Layers. Examples are split between raw SQL and the SSIS ETL tool. Since SSIS is only one tool among many, this information only clutters up the book for most readers. I would have preferred this space been used instead to better cover some common scenarios including:- When loading the Raw Vault, how to handle the situation where only surrogate key (i.e. not business key) values are present in the source file.- How to model the classic Order and Order_Item entities (the book uses airline operational data in its examples).Unfortunately, I’ve never read a book containing so many errors. Mixing up basic terms such as hub, link, and satellite is inexcusable. The writing is overly lengthy, repetitive, and convoluted. It’s frustrating when you need to reread paragraphs two or three times to understand what the author is attempting to convey. Diagrams sometimes don’t match the text, and some even appear in the wrong sections. It’s clear that this book was never proofread, less yet copy edited.Despite my disappointments, the concepts are sound and there is a great deal of important content here.My hope is that the author will issue a future edition that corrects the many problems. This is valuable information for any data architect, and it’s clear that the author is an expert in the subject matter. For someone looking to prototype a DV 2.0 environment, I’d recommend reading the ‘Roelant Vos’ blog which contains detailed code examples.
E**A
Good entry point on Data Vault
Enjoyed:- touches all data vault entities+ddl+loading.NOK- bit outdated in some aspects as of 2022 where Data Vault practice changed some recommendations.- I didn't appreciated the fact all examples using sql servers with screenshots of sql server or analysis server. I think a reference semantic book should had been agnostic of specific technology.
A**R
Worthwhile but could do with refinements
I'm about half way through the book now. I think the core modelling concept in Data Vault is strong, flexible and valuable, hence my interest. This method solves many, but not all issues that say Kimball and Inmon don't. Crucially, it deals with the key issue of agility and that alone is worthwhile. This book takes the DV concept much further though and introduces many additional concepts such as how to run the project and manage the quality of data and processes. These aren't necessarily what a data modeller/architect would be interested in and it may be a challenge for them to suggest to their boss that they need to introduce TQM or Six Sigma to do DV2. However, the author(s) are trying to give a complete overview of their best practice. This could be seen as help for the inexperienced or stretching a really good idea too far. I would also make a number of other general observations. Firstly, I find that some of the writing could be better and it looks like the proof reading was a bit ropey. Second, effort is made explaining concepts that for many data architects will be basic skills/knowledge. I recall an example from one of Kimball's books that illustrates this and which this book also displays. Kimball introduces the idea of the "34 subsystems of ETL", one of which is the surrogate key manager. For many people, using SQL or modern RDBMS, this is just an identity setting on a column in a table. Yet, Kimball (and the others) spend nearly half a page explaining it, which could be argued to get in the way of the saying "you need an identity generator". The other issue is that in any data, you will come across a problem, let's call it "Problem X". All methods will say "here's how we deal with Problem X....you use a ProblemXFixer". So basically, someone has to explain the problem and give the fix a name to fit it into their "standard". In this book, entire chapters are given over to variations on the basic theme of Hubs, Links and Satellites when really, you could probably just bullet point each and say "here's a few variations on the theme". So as with all such standards, many words are used when less could do the trick. Having said all of that, I think the core technique and experience on offer here is worth the effort and I'm off to finish the book. One final thing...the book is quite up to date with MS SQL Server and refers to it quite a lot. You may view that as good/bad depending on your skillset.
M**M
Data Vault Bible is out...and it contains all the holy bits you need
Finally, the Data Vault Bible is out. This is one of those holy grail books on Data Vault I have been longing for the last couple of years when I was first introduced to the methodology and was asked to build a data model following DV standards. Though I referred to online blogs and TDAN sites and ended up borrowing a supposedly DV book by another author (which turned out not be a true DV book as some of the concepts were grossly against DV principles), I ended up posting on DV forums with my queries. That was when the DV inventor Dan himself remarked he's on his way publishing an official DV book. And here it is. It is simply wonderful, with every DV concept nailed to incredible detail. What more...this is DV2.0 book folks, which means, this is going to guide you in the Big Data and Hadoop world when you would want to build a data warehouse, while integrating the Hadoop stack, yet doing it in an Agile manner.For all data warehouse and DV practitioners, this book is Bible, Cookbook, Pocket Guide, Complete Reference, Go To Book etc etc etc !!!
M**R
Excellent and unique Reference on Data Vault 2.0
This book is an excellent reference on how to use Data Vault in your Data Warehouse. Starting from the Architecture, explaining data vault modeling and then going into the details of loading. It covers all essential parts in depth and allows the reader to construct a data vault. In that regard it is way better than the earlier Super Charge Your Data Warehouse: Invaluable Data Modeling Rules to Implement Your Data Vault from Dan Linstedt.I have learned a lot from the book and I often come back to it, using it as the standard solution and evaluating specific questions about modelling or data vault architecture against it. Because I sometimes lend it and forget to whom, so I had to buy it several times – it is worth it. Super Charge Your Data Warehouse: Invaluable Data Modeling Rules to Implement Your Data Vault
A**E
not what I expected
If you would like to know more about the concepts of Data Vault, don't buy this book.Even though one of its authors is Dan Linstedt ("Mr. Data Vault" himself), even though Dan said on his website that you should buy this book.As others have already commented: if you are particularly interested in lots of sql listings and pages of screenshots of how to do (very particular) things in ms sql server, then this is probably the book for you.But of course there is more in this book and I am thinking of a random collection of buzzwords that seem to differentiate "Data Vault2.0" from the "old" Data Vault concept. Maybe this is moderately useful for someone, for example to convince non-technical stakeholders that you are doing "the right thing".This book has not only disappointed me, it has also convinced me that "DV2.0" is nothing more than a heap of b...I would probably think hard before buying any book from the same authors again.(Still Data Vault - I mean without the "2.0" - has its merits, but you if you want to know what it is about, you should simply follow the links in the wikipedia article to the original papers and from thereon just trust your common sense and not the "celebrities")
Trustpilot
2 weeks ago
2 months ago