{"id":10030,"date":"2021-09-07T13:23:31","date_gmt":"2021-09-07T12:23:31","guid":{"rendered":"https:\/\/ee.yelkdev.site\/?p=10030"},"modified":"2024-04-16T09:48:09","modified_gmt":"2024-04-16T08:48:09","slug":"data-tips-the-data-warehouse","status":"publish","type":"post","link":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/","title":{"rendered":"Data tips &#8211; the data warehouse"},"content":{"rendered":"<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">Whatever your role, you might come into contact with a data warehouse at some point. They&#8217;re incredibly powerful tools that I&#8217;ve frequently seen ignored or used in ways that cause problems. In this post, I&#8217;ll share a few tips for handling data in your data warehouse effectively. Whilst the focus here is helping data scientists, these tips have helped other tech professionals answer their own data questions and avoid unnecessary work, too.<\/span><\/p>\n<h2><b>Gaps in traditional data science training<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Traditional data science training focuses on academic techniques, not practicalities of dealing with data in the real world. The odds are you&#8217;ll learn five different ways to perform a clustering, but you&#8217;ll not write a test or a SQL query.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Fresh data scientists, landing in their first professional roles, can find that the reality of data science is not what they expected. They must deal with messy and unreliable terabyte-scale datasets that they can&#8217;t just load into a Pandas or R dataframe to clean and refine. Their code needs to run efficiently enough to be useful and cost-effective. It needs to run in environments other than their laptop, robustly interact with remote services and be maintained over time and by other people.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This mismatch between training, expectation and reality can cause real problems*. At one end of the scale, you get data scientists producing code that needs to be &#8220;productionised&#8221; (well, rewritten) &#8211; by &#8220;data engineers&#8221; to be fit for its real-world purpose. At the scarier end of the scale you get data science time investments that have to be thrown in the bin <\/span><span style=\"font-weight: 400;\"> because a couple of early mistakes mean it just doesn&#8217;t work in practice.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">(* this blog post isn\u2019t about the other big expectation mismatch &#8211; how you\u2019ll likely be spending a good chunk of your time doing pretty basic things like writing simple reports, not the cool data science stuff&#8230; sorry <\/span><\/i><i><span style=\"font-weight: 400;\">)<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Data science training is adapting to make more room for practical skills. Here&#8217;s a few tips to avoid making mistakes I\u2019ve helped fix for fresh and seasoned data scientists out there in the field today.<\/span><\/p>\n<h1><b>Use your data warehouse effectively<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">There\u2019s a good chance that your data science is applied to data that came from a SQL-based analytics system like <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\"><span style=\"font-weight: 400;\">Google BigQuery<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/aws.amazon.com\/redshift\"><span style=\"font-weight: 400;\">AWS Redshift<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/azure.microsoft.com\/en-gb\/services\/synapse-analytics\"><span style=\"font-weight: 400;\">Azure Analytics<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/www.snowflake.com\/\"><span style=\"font-weight: 400;\">Snowflake<\/span><\/a><span style=\"font-weight: 400;\"> or even <\/span><a href=\"https:\/\/spark.apache.org\/sql\"><span style=\"font-weight: 400;\">Spark SQL<\/span><\/a><span style=\"font-weight: 400;\">. These systems are engineered to efficiently answer complex queries against big datasets, not to serve up large quantities of raw data. You might be able to:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">SELECT<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0*<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM client_transactions<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">&#8230;streaming out fifty columns by ten million rows to find the top ten clients by spend in <\/span><a href=\"https:\/\/pandas.pydata.org\/\"><span style=\"font-weight: 400;\">Pandas<\/span><\/a><span style=\"font-weight: 400;\">, but it\u2019ll be an awful lot faster and more robust if you do more work in the data warehouse and retrieve just the ten rows and two columns you&#8217;re really interested in:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">SELECT<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0client_id,<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0SUM(transaction_value) total_spend<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0client_transactions<\/span>\r\n\r\n<span style=\"font-weight: 400;\">GROUP BY<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0client_id<\/span>\r\n\r\n<span style=\"font-weight: 400;\">ORDER BY<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0total_spend DESC<\/span>\r\n\r\n<span style=\"font-weight: 400;\">LIMIT 10<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">As the table you&#8217;re accessing grows, the benefits of doing the work in the data warehouse will too.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">I recall working with a data scientist, fresh out of university, who did not want to learn SQL and took the former approach. It worked well enough while the product was in development and soft launch. Three months after public launch, they were trying to download tens of gigabytes of data from the data warehouse. We ran out of hardware and networking options to keep the complex analytics pipeline they had built on their laptop running and it ground to an undignified halt.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Often, some of the work has to be done in your code outside the data warehouse. These tips encourage you to do as much of the work as you can in your data warehouse before you pull just what you need out to process in your code. Modern data warehouses are capable of doing more right there in the warehouse, <\/span><a href=\"https:\/\/cloud.google.com\/bigquery-ml\/docs\/introduction\"><span style=\"font-weight: 400;\">including<\/span><\/a> <a href=\"https:\/\/aws.amazon.com\/redshift\/features\/redshift-ml\/\"><span style=\"font-weight: 400;\">building<\/span><\/a><span style=\"font-weight: 400;\"> machine learning models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We\u2019ll focus on SQL for the next few tips as it\u2019s so prevalent, but similar principles and capabilities are often available in whatever data storage and parallel processing capability you are using. The tips apply in principle to any query language, so you may be able to apply some of them when working with systems like <\/span><a href=\"https:\/\/neo4j.com\/\"><span style=\"font-weight: 400;\">Neo4J<\/span><\/a><span style=\"font-weight: 400;\"> or <\/span><a href=\"https:\/\/www.mongodb.com\/\"><span style=\"font-weight: 400;\">MongoDB<\/span><\/a><span style=\"font-weight: 400;\">. I&#8217;ll typically refer to Google&#8217;s BigQuery data warehouse, as I find it has excellent SQL feature coverage and the documentation is clear and well-written.<\/span><\/p>\n<h2><b>Tip 1: Learn and use the query language<\/b><\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/SQL\"><span style=\"font-weight: 400;\">SQL<\/span><\/a><span style=\"font-weight: 400;\"> has been around for a long time &#8211; even older than I have (just). The <\/span><a href=\"https:\/\/twobithistory.org\/2017\/12\/29\/codd-relational-model.html\"><span style=\"font-weight: 400;\">foundations were laid back in 1970<\/span><\/a><span style=\"font-weight: 400;\">. It\u2019s an enormously expressive and powerful language. You can probably do more than you think with it, and making the effort to try and push as much work as you can into the database, building your SQL knowledge as you go, will pay dividends quickly. You get the added bonus that non-data scientists could understand and be able to help with your queries, as SQL is commonly used by other data folk that perform analytics and reporting functions.<\/span><\/p>\n<p>There\u2019s a lot more power and flexibility in modern SQL than the basic\u00a0<code>SELECT<\/code>,\u00a0<code>WHERE<\/code>, and\u00a0<code>GROUP BY<\/code>\u00a0would suggest.<span style=\"font-weight: 400;\">\u00a0Every database has a query optimiser that looks at your query and statistics about the tables involved before selecting a strategy to answer most efficiently. <\/span><a href=\"https:\/\/learnsql.com\/blog\/illustrated-guide-sql-self-join\/\"><span style=\"font-weight: 400;\">Self-joins<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/nested-repeated\"><span style=\"font-weight: 400;\">nested columns, STRUCT types<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/user-defined-functions\"><span style=\"font-weight: 400;\">user-defined functions<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/www.toptal.com\/sql\/intro-to-sql-windows-functions\"><span style=\"font-weight: 400;\">window or analytic functions<\/span><\/a><span style=\"font-weight: 400;\"> are a few capabilities common to many data warehouses that I\u2019ve found very helpful in expressively solving trickier problems, and they are less-well known amongst data scientists.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">I found the first couple of paragraphs in <\/span><\/i><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/analytic-function-concepts\"><i><span style=\"font-weight: 400;\">BigQuery&#8217;s window function documentation<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> provided a really useful mental model of what they are, how they differ from normal aggregations and when they might be useful.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Once you start using SQL, you\u2019ll find yourself writing larger and more complex queries. Let&#8217;s talk about how you can control that complexity and keep your queries manageable.<\/span><\/p>\n<h2><b>Tip 2: Use common table expressions<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">You can think of <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/query-syntax#with_clause\"><span style=\"font-weight: 400;\">Common Table Expressions (CTEs)<\/span><\/a><span style=\"font-weight: 400;\"> as virtual tables within your query. As your simple query becomes more complex, you might be tempted to bolt together multiple sub-queries like this:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">SELECT a, b, c<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM (SELECT ...)<\/span>\r\n\r\n<span style=\"font-weight: 400;\">INNER JOIN (SELECT ...)<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">This will work, and I\u2019ve seen plenty of queries done this way. They can easily grow into SQL behemoths hundreds of lines long that are really difficult to understand and reason about. The sub-queries are a big part of the problem, nesting complex logic inside other queries. Instead of creating a large, monolithic query, you can pull out those complex queries into CTEs and give them meaningful names.<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">WITH<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0clients AS (SELECT ....),<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0transactions AS (SELECT \u2026)<\/span>\r\n\r\n<span style=\"font-weight: 400;\">SELECT a, b, c<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM clients<\/span>\r\n\r\n<span style=\"font-weight: 400;\">INNER JOIN transactions<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Now you can see what that query is all about. As well as making your queries easier to understand, breaking up a large query like this means you can query the CTEs to see what they are producing while you develop and debug your query. <\/span><a href=\"https:\/\/jamesrledoux.com\/code\/sql-cte-common-table-expressions\"><span style=\"font-weight: 400;\">Here<\/span><\/a><span style=\"font-weight: 400;\"> is a more in-depth tutorial on CTEs.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Equal Experts Associate Dawid du Toit highlights one thing to watch out for when using CTEs. If you use the CTE multiple times during your query (for example, a self-join from the CTE back onto itself) you may find the CTE evaluated twice. In this, materializing the data into a temporary table may be a better option if the cost of the query is a significant concern.<\/span><\/i><\/p><\/blockquote>\n<h2><b>Tip 3: Use views<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">So you\u2019ve started using CTEs and you\u2019re writing much more expressive queries. Things can still become complicated, and it\u2019s not always convenient to change your query to inspect your CTEs &#8211; for example, when your query is running out there in production and can\u2019t be modified.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once you\u2019ve got a CTE, it\u2019s straightforward to convert into a database <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/views-intro\"><span style=\"font-weight: 400;\">view<\/span><\/a><span style=\"font-weight: 400;\">. A view is just a query you define in the database, so you can just take your CTE and wrap it in a <\/span><strong>CREATE VIEW<\/strong><span style=\"font-weight: 400;\"> statement. You can then use the view as if it were a table &#8211; which means you can query it and inspect the data it contains at any time without modifying your deployed queries.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Views are usually heavily optimised and don&#8217;t have to introduce any performance or cost penalty. For example, implementing &#8220;<\/span><\/i><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/migrating-from-legacy-sql\"><i><span style=\"font-weight: 400;\">predicate pushdown<\/span><\/i><\/a><i><span style=\"font-weight: 400;\">&#8221; allows a data warehouse to optimise the data and computation the view uses based on how it is queried &#8211; as if the view was just part of the query.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Views are logical infrastructure, and lend themselves to deployment with an &#8220;infrastructure as code&#8221; mindset the same way as we might deploy a server or the database itself.<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">CREATE VIEW clients AS (SELECT ....);<\/span>\r\n\r\n<span style=\"font-weight: 400;\">CREATE VIEW transactions AS (SELECT \u2026);<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Then, we can use these views in queries, whether executed ad-hoc or as part of a data pipeline.<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">SELECT a, b, c<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM clients<\/span>\r\n\r\n<span style=\"font-weight: 400;\">INNER JOIN transactions<\/span><\/pre>\n<p>&nbsp;<\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Database views provide a much more flexible interface to underlying tables. The underlying data will eventually need to change. If consumers access it through views, the views can adapt to protect those consumers from the change, avoiding broken queries and irate consumers.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Views are also a great way to avoid data engineering complexity. I&#8217;ve seen plenty of complex data processing pipelines that populate tables with refined results from other tables, ready for further processing. In every case I can recall, a view is a viable alternative approach, and it avoids managing the data in intermediate tables. No complex data synchronisation like backfilling, scheduling or merging challenges here! If your data warehouse supports it, a <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/materialized-views\"><span style=\"font-weight: 400;\">materialized view<\/span><\/a><span style=\"font-weight: 400;\"> can work where there are genuine reasons, such as performance or cost, to avoid directly querying the source table.<\/span><\/p>\n<h2><b>Tip 4: Use user defined functions<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Databases have supported <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/user-defined-functions\"><span style=\"font-weight: 400;\">User Defined Functions (UDFs)<\/span><\/a><span style=\"font-weight: 400;\"> for a long time. This useful capability allows you to capture logic right in the database, give it a meaningful name, and reuse it in different queries. A simple example of a UDF might parse a list from a comma-separated string, and return the first element. For example, given a function like this (using the BigQuery SQL dialect)<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">CREATE FUNCTION my_dataset.get_first_post(post_list STRING)<\/span>\r\n\r\n<span style=\"font-weight: 400;\">RETURNS STRING<\/span>\r\n\r\n<span style=\"font-weight: 400;\">AS (<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0SPLIT(post_list, ',')[SAFE_OFFSET(0)]<\/span>\r\n\r\n<span style=\"font-weight: 400;\">)<\/span><\/pre>\n<p>&nbsp;<\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Specify the return type of your query, even if it can be inferred. That way, you&#8217;ll know if you&#8217;re returning something totally different to what you expected.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">We can use the function in other queries:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">WITH post_lists AS (<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0SELECT &#8216;a,b,c,d&#8217; AS post_list<\/span><\/p>\n<p><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">SELECT<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0post_list,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0my_dataset.get_first_post(post_list) first_post<\/span><\/p>\n<p><span style=\"font-weight: 400;\">FROM post_lists<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The result would be as follows:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>post_list<\/b><\/td>\n<td><b>first_post<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">a,b,c,d<\/span><\/td>\n<td><span style=\"font-weight: 400;\">a<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The benefits might not be clear from this simple function, but I recently implemented a <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Levenshtein_distance\"><span style=\"font-weight: 400;\">Levenshtein Distance<\/span><\/a><span style=\"font-weight: 400;\"> function using <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/user-defined-functions#javascript-udf-structure\"><span style=\"font-weight: 400;\">BigQuery&#8217;s JavaScript UDF capability<\/span><\/a><span style=\"font-weight: 400;\">, allowing us to find likely typos in a large dataset in the data warehouse.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Dawid recalls seeing UDFs that take a parameter which then go and query another table. This can completely kill your query performance, for example, where you could just have just joined with the other table. Be wary of using UDFs in this way!<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">A benefit that might not be immediately obvious is that you can deploy a fix or improvement to your UDF and all queries using it pick up the change when they are next executed. Be aware that there may be little of the structural flexibility you might be used to in other programming languages. For example, you might not be able to &#8220;overload&#8221; a UDF to handle an additional argument, nor can you provide a default value to an argument.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These limitations can make handling change even harder than in other languages &#8211; but I&#8217;ve found it&#8217;s still a big improvement over copying and pasting.<\/span><\/p>\n<h2><b>Tip 5: Understand what\u2019s efficient<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Doing your number-crunching in your data warehouse is likely to be the most cost-efficient place to do it. After all, modern data warehouses are really just massively parallel compute clusters.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are typically ways to make your processing more efficient &#8211; which always means cheaper and usually means faster. For example, a clickstream table might stretch back five years and be 150Tb in total size. <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/partitioned-tables\"><span style=\"font-weight: 400;\">Partitioning<\/span><\/a><span style=\"font-weight: 400;\"> that table by the event date means that the data warehouse can just scan yesterday\u2019s data rather than the whole table to give you a total of some metric for yesterday &#8211; so long as you construct your query to take advantage of it. In this case, it&#8217;s likely as straightforward as ensuring your query\u00a0<code>WHERE<\/code> clause contains a simple date constraint on the partition column.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Understanding how your data warehouse is optimised (or how to optimise it, if you\u2019re setting up the table), can make a real difference to your query performance and cost.<\/span><\/i><\/p><\/blockquote>\n<h1><b>Writing better SQL, faster<\/b><\/h1>\n<p><span style=\"font-weight: 400;\">As you gain competency in writing SQL (or any language), you\u2019ll find your queries and programs becoming more complex. That will make tracking down problems more difficult. Here\u2019s a few tips to help you find and squash your bugs more easily, be more confident that your code does what it\u2019s supposed to, and avoid the problems in the first place!<\/span><\/p>\n<h2><b>Tip 6: Test your UDFs<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">You can quite easily test your UDFs. Here\u2019s a cut-down version of the query I wrote to test that Levenshtein distance function I mentioned earlier:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">--- create a virtual table of function inputs and expected outputs, via a CTE<\/span>\r\n\r\n<span style=\"font-weight: 400;\">WITH examples AS (<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0SELECT 'kittens' AS term_1, 'mittens' AS term_2, 1 AS expected<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0UNION ALL SELECT 'kittens', 'kittens', 0<\/span>\r\n\r\n<span style=\"font-weight: 400;\">)<\/span>\r\n\r\n\r\n\r\n\r\n<span style=\"font-weight: 400;\">--- create a new CTE that injects the actual outputs from the function<\/span>\r\n\r\n<span style=\"font-weight: 400;\">, test AS (<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0SELECT<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0*,<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0my_dataset.levenshtein_distance(term_1, term_2) actual<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0FROM examples<\/span>\r\n\r\n<span style=\"font-weight: 400;\">)<\/span>\r\n\r\n\r\n\r\n\r\n<span style=\"font-weight: 400;\">SELECT<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0*,<\/span>\r\n\r\n<span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0IF(TO_JSON_STRING(actual) = TO_JSON_STRING(expected), 'pass', 'fail') result<\/span>\r\n\r\n<span style=\"font-weight: 400;\">FROM tes<\/span>\r\n\r\n\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">Results:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>term_1<\/b><\/td>\n<td><b>term_2<\/b><\/td>\n<td><b>expected<\/b><\/td>\n<td><b>actual<\/b><\/td>\n<td><b>result<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">kittens<\/span><\/td>\n<td><span style=\"font-weight: 400;\">mittens<\/span><\/td>\n<td><span style=\"font-weight: 400;\">1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">1<\/span><\/td>\n<td><span style=\"font-weight: 400;\">pass<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">kittens<\/span><\/td>\n<td><span style=\"font-weight: 400;\">kittens<\/span><\/td>\n<td><span style=\"font-weight: 400;\">0<\/span><\/td>\n<td><span style=\"font-weight: 400;\">0<\/span><\/td>\n<td><span style=\"font-weight: 400;\">pass<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">If the query produces any rows that do not contain &#8216;pass&#8217; in the result column, we know that there was an issue &#8211; that the actual function output did not match our expectation.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Although not needed for these examples, I typically use a function like\u00a0<code>TO_JSON_STRING<\/code><\/span><\/i><i><span style=\"font-weight: 400;\">\u00a0that \u201cstringifies\u201d the expected and actual content in a deterministic manner so that structured values and NULL are compared correctly. Be aware that <\/span><\/i><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/json_functions#to_json_string\"><i><span style=\"font-weight: 400;\">the documentation for TO_JSON_STRING<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> does not guarantee deterministic order of keys in STRUCTs, but I&#8217;ve seen no problems in practice.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Tests have one more huge benefit that might not be immediately obvious. When you do data science for an organisation, it\u2019s likely that someone else will one day need to understand your work. They might even need to make changes. Having some runnable tests means that both of those things will be easier, and that person will be much more likely to think fondly of you.<\/span><\/p>\n<h2><b>Tip 7: Test your views<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Your views will depend on tables, and as you use these techniques you\u2019ll find that you end with views that depend on other views. It\u2019s handy to know that your views actually do what they are supposed to. You can test views in a similar fashion to UDFs, but it is a little more involved because unlike a UDF, a view must explicitly refer to the tables or views that provide the data it uses.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I have rolled my own testing solutions for these networks of views before. You can create test data, for example in the form of CSV files, that you upload to temporary tables. You can then create a temporary version of your views that use these tables with known, fixed content. The same testing approach outlined for UDFs can now check that your views produce the right data, for the same benefits!<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">A templating engine like <\/span><\/i><a href=\"https:\/\/palletsprojects.com\/p\/jinja\/\"><i><span style=\"font-weight: 400;\">Jinja templates<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> and <\/span><\/i><a href=\"https:\/\/www.gnu.org\/software\/gettext\/manual\/html_node\/envsubst-Invocation.html\"><i><span style=\"font-weight: 400;\">envsubst<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> can provide a relatively straightforward way to change values in queries and views that vary between environments using environment variables. A query containing\u00a0<code>{{my_dataset}}.table<\/code>\u00a0can be adjusted to\u00a0<code>test.my_table<\/code>\u00a0in the test environment and\u00a0<code>prod.my_table<\/code>\u00a0in production.<\/span><\/i><\/p><\/blockquote>\n<h2><b>Tip 8: Use dbt<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">I\u2019ll also mention some tooling here. We&#8217;ve found dbt quite effective for managing networks of views, like those described above. It directly supports flexible references for source views and tables, provides built-in data and view testing capabilities and can even generate documentation, as in <\/span><a href=\"https:\/\/www.getdbt.com\/mrr-playbook\/#!\/overview\"><span style=\"font-weight: 400;\">this example<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><span style=\"font-weight: 400;\">Dataform<\/span><span style=\"font-weight: 400;\"> is an alternative providing similar capabilities. Given that it\u2019s recently been acquired by Google, it\u2019ll likely be making an appearance as part of GCP soon, (thanks to Equal Experts Associate Gareth Rowlands for pointing this out!)<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Data Studio team member <\/span><\/i><a href=\"https:\/\/equalexperts.blogin.co\/users\/42872\"><i><span style=\"font-weight: 400;\">Cl\u00e1udio Diniz<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> has also been using dbt. You can find more details in <\/span><\/i><a href=\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/how-we-ended-up-creating-language-agnostic-data-pipelines-for-our-customers-at-equal-experts\/\"><i><span style=\"font-weight: 400;\">his post<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> and the related example <\/span><\/i><a href=\"https:\/\/github.com\/cdiniz\/language-agnostic-data-pipelines\"><i><span style=\"font-weight: 400;\">Github repo<\/span><\/i><\/a><i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/i><\/p><\/blockquote>\n<h2><b>Tip 9: Your time is valuable <\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Organisations often promote directives to save costs wherever possible. It\u2019s not the first time I\u2019ve come across posters, screens and mailshots for employees that remind staff to think about how much things cost and save every penny they can. In principle, that\u2019s a great directive. Unfortunately, it can create focus on small costs that are easy to quantify at the expense of larger costs that are more obscure.<\/span><\/p>\n<p>It\u2019s great to create focus on a query that would process 10Tb but could execute for 1\/1000th that cost with the addition of a\u00a0<code>WHERE<\/code>\u00a0clause that takes advantage of partitioning. Why not save easy money?<\/p>\n<p><span style=\"font-weight: 400;\">On the other hand, I\u2019ve seen occasions when hours or days of work are invested to make a query more efficient, perhaps saving 2p per day at the expense of &#8211; let\u2019s be conservative &#8211; two days of a data scientists\u2019 time. At an average salary of \u00a350k, that means we\u2019re looking at a <\/span><b>\u00a3400 investment to save \u00a37.30 per year<\/b><span style=\"font-weight: 400;\"> &#8211; and we\u2019re ignoring the <\/span><b>opportunity cost<\/b><span style=\"font-weight: 400;\"> that your time could be spent working on something else.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">This is a naive estimate. How much you cost your company is complicated. I\u2019m using the gross salary as an estimate of employee cost. I think the true cost of a typical employee to a typical company is still likely to be higher than the gross salary after factors like tax breaks and equipment costs are taken into account.<\/span><\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Don\u2019t forget to consider what your time is worth!<\/span><\/p>\n<h2><b>Bonus Tip: Google it!<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">The last tip I\u2019d share with you is to check for existing solutions before you build it yourself. The odds are that the problem you are thinking about isn\u2019t novel &#8211; others have likely come across that problem before. If you\u2019re lucky, you\u2019ll be able to understand and use their solution directly. If not, you\u2019ll learn about someone else\u2019s experience and you can use that to inform your solution.<\/span><\/p>\n<blockquote><p><i><span style=\"font-weight: 400;\">Reading about a solution that didn\u2019t work is a lot quicker than inventing it and discovering that it doesn\u2019t work yourself.<\/span><\/i><\/p><\/blockquote>\n<h2><b>The end\u2026?<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">If you got this far, thanks for reading this post! I hope some of the tips here will be useful.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There\u2019s more to productive data science than effective use of your data warehouse, so watch this space for more hints and tips in future.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whatever your role, you might come into contact with a data warehouse at some point. They&#8217;re incredibly powerful tools that are frequently ignored or used in ways that cause problems. This post contains tips for handling data in your data warehouse effectively.<\/p>\n","protected":false},"author":168,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","inline_featured_image":false,"footnotes":""},"categories":[5],"tags":[185,207],"location":[397],"class_list":["post-10030","post","type-post","status-publish","format-standard","hentry","category-our-thinking","tag-data","tag-data-warehouse"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data tips - the data warehouse | Equal Experts<\/title>\n<meta name=\"description\" content=\"Hints and tips for data scientists and other tech professionals on how to handle data in your data warehouse effectively.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data tips - the data warehouse\" \/>\n<meta property=\"og:description\" content=\"Whatever your role, you might come into contact with a data warehouse at some point. They&#039;re incredibly powerful tools that are frequently ignored or used in ways that cause problems. This post contains tips for handling data in your data warehouse effectively.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\" \/>\n<meta property=\"og:site_name\" content=\"Equal Experts\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-07T12:23:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-16T08:48:09+00:00\" \/>\n<meta name=\"author\" content=\"Paul Brabban\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Data tips - the data warehouse\" \/>\n<meta name=\"twitter:description\" content=\"Whatever your role, you might come into contact with a data warehouse at some point. They&#039;re incredibly powerful tools that are frequently ignored or used in ways that cause problems. This post contains tips for handling data in your data warehouse effectively.\" \/>\n<meta name=\"twitter:creator\" content=\"@EqualExperts\" \/>\n<meta name=\"twitter:site\" content=\"@EqualExperts\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Paul Brabban\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\"},\"author\":{\"name\":\"Paul Brabban\",\"@id\":\"https:\/\/www.equalexperts.com\/#\/schema\/person\/ad309d5a8484849a75e1bdd9fe56878c\"},\"headline\":\"Data tips &#8211; the data warehouse\",\"datePublished\":\"2021-09-07T12:23:31+00:00\",\"dateModified\":\"2024-04-16T08:48:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\"},\"wordCount\":3077,\"publisher\":{\"@id\":\"https:\/\/www.equalexperts.com\/#organization\"},\"keywords\":[\"data\",\"data warehouse\"],\"articleSection\":[\"Our Thinking\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\",\"url\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\",\"name\":\"Data tips - the data warehouse | Equal Experts\",\"isPartOf\":{\"@id\":\"https:\/\/www.equalexperts.com\/#website\"},\"datePublished\":\"2021-09-07T12:23:31+00:00\",\"dateModified\":\"2024-04-16T08:48:09+00:00\",\"description\":\"Hints and tips for data scientists and other tech professionals on how to handle data in your data warehouse effectively.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.equalexperts.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data tips &#8211; the data warehouse\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.equalexperts.com\/#website\",\"url\":\"https:\/\/www.equalexperts.com\/\",\"name\":\"Equal Experts\",\"description\":\"Making Software. Better.\",\"publisher\":{\"@id\":\"https:\/\/www.equalexperts.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.equalexperts.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.equalexperts.com\/#organization\",\"name\":\"Equal Experts\",\"url\":\"https:\/\/www.equalexperts.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/www.equalexperts.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.equalexperts.com\/wp-content\/uploads\/2018\/08\/Equal_Experts_Logo_CMYK_Colour.jpg\",\"contentUrl\":\"https:\/\/www.equalexperts.com\/wp-content\/uploads\/2018\/08\/Equal_Experts_Logo_CMYK_Colour.jpg\",\"width\":719,\"height\":340,\"caption\":\"Equal Experts\"},\"image\":{\"@id\":\"https:\/\/www.equalexperts.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/EqualExperts\",\"https:\/\/www.linkedin.com\/company\/equal-experts\/?viewAsMember=true\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.equalexperts.com\/#\/schema\/person\/ad309d5a8484849a75e1bdd9fe56878c\",\"name\":\"Paul Brabban\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/www.equalexperts.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a3520a00b97202664d60d70e71cb1aa5e1de19cd19a34f37d5622a973493db53?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a3520a00b97202664d60d70e71cb1aa5e1de19cd19a34f37d5622a973493db53?s=96&d=mm&r=g\",\"caption\":\"Paul Brabban\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Data tips - the data warehouse | Equal Experts","description":"Hints and tips for data scientists and other tech professionals on how to handle data in your data warehouse effectively.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/","og_locale":"en_GB","og_type":"article","og_title":"Data tips - the data warehouse","og_description":"Whatever your role, you might come into contact with a data warehouse at some point. They're incredibly powerful tools that are frequently ignored or used in ways that cause problems. This post contains tips for handling data in your data warehouse effectively.","og_url":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/","og_site_name":"Equal Experts","article_published_time":"2021-09-07T12:23:31+00:00","article_modified_time":"2024-04-16T08:48:09+00:00","author":"Paul Brabban","twitter_card":"summary_large_image","twitter_title":"Data tips - the data warehouse","twitter_description":"Whatever your role, you might come into contact with a data warehouse at some point. They're incredibly powerful tools that are frequently ignored or used in ways that cause problems. This post contains tips for handling data in your data warehouse effectively.","twitter_creator":"@EqualExperts","twitter_site":"@EqualExperts","twitter_misc":{"Written by":"Paul Brabban","Estimated reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#article","isPartOf":{"@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/"},"author":{"name":"Paul Brabban","@id":"https:\/\/www.equalexperts.com\/#\/schema\/person\/ad309d5a8484849a75e1bdd9fe56878c"},"headline":"Data tips &#8211; the data warehouse","datePublished":"2021-09-07T12:23:31+00:00","dateModified":"2024-04-16T08:48:09+00:00","mainEntityOfPage":{"@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/"},"wordCount":3077,"publisher":{"@id":"https:\/\/www.equalexperts.com\/#organization"},"keywords":["data","data warehouse"],"articleSection":["Our Thinking"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/","url":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/","name":"Data tips - the data warehouse | Equal Experts","isPartOf":{"@id":"https:\/\/www.equalexperts.com\/#website"},"datePublished":"2021-09-07T12:23:31+00:00","dateModified":"2024-04-16T08:48:09+00:00","description":"Hints and tips for data scientists and other tech professionals on how to handle data in your data warehouse effectively.","breadcrumb":{"@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.equalexperts.com\/blog\/our-thinking\/data-tips-the-data-warehouse\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.equalexperts.com\/"},{"@type":"ListItem","position":2,"name":"Data tips &#8211; the data warehouse"}]},{"@type":"WebSite","@id":"https:\/\/www.equalexperts.com\/#website","url":"https:\/\/www.equalexperts.com\/","name":"Equal Experts","description":"Making Software. Better.","publisher":{"@id":"https:\/\/www.equalexperts.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.equalexperts.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Organization","@id":"https:\/\/www.equalexperts.com\/#organization","name":"Equal Experts","url":"https:\/\/www.equalexperts.com\/","logo":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.equalexperts.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.equalexperts.com\/wp-content\/uploads\/2018\/08\/Equal_Experts_Logo_CMYK_Colour.jpg","contentUrl":"https:\/\/www.equalexperts.com\/wp-content\/uploads\/2018\/08\/Equal_Experts_Logo_CMYK_Colour.jpg","width":719,"height":340,"caption":"Equal Experts"},"image":{"@id":"https:\/\/www.equalexperts.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/EqualExperts","https:\/\/www.linkedin.com\/company\/equal-experts\/?viewAsMember=true"]},{"@type":"Person","@id":"https:\/\/www.equalexperts.com\/#\/schema\/person\/ad309d5a8484849a75e1bdd9fe56878c","name":"Paul Brabban","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.equalexperts.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a3520a00b97202664d60d70e71cb1aa5e1de19cd19a34f37d5622a973493db53?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a3520a00b97202664d60d70e71cb1aa5e1de19cd19a34f37d5622a973493db53?s=96&d=mm&r=g","caption":"Paul Brabban"}}]}},"_links":{"self":[{"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/posts\/10030","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/users\/168"}],"replies":[{"embeddable":true,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/comments?post=10030"}],"version-history":[{"count":0,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/posts\/10030\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/media?parent=10030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/categories?post=10030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/tags?post=10030"},{"taxonomy":"location","embeddable":true,"href":"https:\/\/www.equalexperts.com\/wp-json\/wp\/v2\/location?post=10030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}