Problematic:
DBT tool generates static documentation in several files (index.html
, catalog.json
, manifest.json
).
Without a web server, is not possible to read or share this documentation.
- In local, CORS security is enabled (https://en.wikipedia.org/wiki/Cross-origin_resource_sharing)
- In some cloud storage (like Google Cloud Storage), it's only possible to share a single static page
Solution:
Update the javascript code inside the index.html
.
Put the content of the json files directly in this file and remove network loading.
Other information:
Test with dbt==1.0.0
and dbt==0.20.2
In [1]:
import json
import re
import os
PATH_DBT_PROJECT = ""
In [2]:
search_str = 'o=[i("manifest","manifest.json"+t),i("catalog","catalog.json"+t)]'
with open(os.path.join(PATH_DBT_PROJECT, 'target', 'index.html'), 'r') as f:
content_index = f.read()
with open(os.path.join(PATH_DBT_PROJECT, 'target', 'manifest.json'), 'r') as f:
json_manifest = json.loads(f.read())
# In the static website there are 2 more projects inside the documentation: dbt and dbt_bigquery
# This is technical information that we don't want to provide to our final users, so we drop it
# Note: depends of the connector, here we use BigQuery
IGNORE_PROJECTS = ['dbt', 'dbt_bigquery']
for element_type in ['nodes', 'sources', 'macros', 'parent_map', 'child_map']: # navigate into manifest
# We transform to list to not change dict size during iteration, we use default value {} to handle KeyError
for key in list(json_manifest.get(element_type, {}).keys()):
for ignore_project in IGNORE_PROJECTS:
if re.match(fr'^.*\.{ignore_project}\.', key): # match with string that start with '*.<ignore_project>.'
del json_manifest[element_type][key] # delete element
with open(os.path.join(PATH_DBT_PROJECT, 'target', 'catalog.json'), 'r') as f:
json_catalog = json.loads(f.read())
with open(os.path.join(PATH_DBT_PROJECT, 'target', 'index2.html'), 'w') as f:
new_str = "o=[{label: 'manifest', data: "+json.dumps(json_manifest)+"},{label: 'catalog', data: "+json.dumps(json_catalog)+"}]"
new_content = content_index.replace(search_str, new_str)
f.write(new_content)
Understand the javascript code:
If we search the string "manifest.json" or "catalog.json" in the HTML file, we find the function loadProject()
.
This is the place where are loaded the data.
n.loadProject = function() {
var t = "?cb=" + new Date().getTime(),
o = [i("manifest", "manifest.json" + t), i("catalog", "catalog.json" + t)];
// ...
}
If we check the i()
function that confirm that.
function i(e, n) {
return t({
method: "GET",
url: n
}).then((function(t) {
return {
label: e,
data: t.data
}
}), (function(t) {
// ...
}))
}