top of page

Construction's Dark Data Problem

Updated: Mar 28, 2023

Dark data, or data which is produced but not utilized, is being generated, stored, and archived on every active construction project; it’s one of the most valuable resources that many don’t know exists. And it carries a hefty price tag. Left unaddressed, dark data may be costing larger E&C firms in excess of $100M USD per annum. Even when identified, many firms face obstacles in putting their dark data to work. However, the benefits of addressing dark data are well worth the necessary investment. Future-ready E&C firms are taking steps to operationalize their dark data, in turn establishing a performance edge over industry competitors and substantially reducing data management and storage costs. If you haven’t yet begun exploring your dark data problem, it’s time to start.

Technology Growth in an Expanding Market

The construction market is growing, despite the spending cuts and project contractions resultant of the COVID-19 pandemic in 2019 and 2020. The market worth of the global construction industry was pegged at $12.6T USD in 2020, with a projected CAGR increase of 7.4% between 2021 and 2028. Construction spending has rebounded from lows experienced in 2019 and 2020; total industry spend in the United States was 12% higher in July 2021 than average spend in 2019. The industry outlook is positive, albeit the impacts of the recently discovered Omicron variant remain to be quantified.

The construction industry isn't only experiencing a substantial market rebound as we move into the 3rd year of the COVID-19 pandemic; it's experiencing a rebound in technology investment. Total industry technology investment in 2018 reached levels which may have seemed unfathomable for those who still view construction as an industry lagging in digitalization. Investors flooded the space between 2015 and 2018, with a total investment of $3.1B USD in the latter year. While 2018 was a peak year for acquisitions, including Plangrid by Autodesk and ViewPoint by Trimble, funding declined substantially in 2019 as projects were placed on hold and construction sites shut down due to COVID-related lockdowns and CAPEX pullbacks.

Technology funding in 2021 is again soaring to new highs, with $2.1BUSD in investment as of October. 2021 investment values outpaced 2020 investments by 100%, with the greatest growth in late-stage funding. Investors are throwing their weight behind more mature and experienced technology firms that have a market base, established client portfolios, and solutions that address specific industry problems. This investment isn't arbitrary; it's predicated on the substantial construction technology market growth projections and assumed growth in digitalization investment as a COVID-19 endemic becomes a likely reality.

Yet, despite the rapid deployment of digital solutions prompted by COVID, construction practitioners remain skeptical of adopting new tools and technology. According to a recent survey of Canadian construction contractors, three quarters of respondents rank their digital maturity as fairly low. 35.9% of industry practitioners are hesitant to try new technology, according to JB Knowledge, a consultancy based in the United States. Despite a strong market rebound, significant investment in the construction technology space, numerous unicorn exits, massive increases in technology deployments on projects, and a need to keep a portion of the workforce remote, the industry remains somewhat tempered in its digital focus. However, the hesitation currently demonstrated by some firms may be costly in years to come.

Growth in Data Production

There are nearly 1-million general contractors in the United States, and an estimated 3 million to 5 million workers on US construction sites at any given time. Each of those workers contributes to the production of project data. Whether developing spreadsheets, capturing progress in field reports, or filling out timesheets, project data volume increases by the minute; data growth will become far more substantial as market growth continues and project digital solution deployments increase.

The volume of data produced in the engineering and construction industry tripled between 2018 and 2021. Firms that are currently struggling with data management will find this challenge to be more pronounced in years to come as project sizes increase and market expansion continues. Further, construction professionals already spend 13% of their working hours searching for data needed to perform their tasks. That's 12% too much time invested in unproductive work resultant from managing the sheer volume of data already being produced on construction projects today. And this challenge isn't phase specific; engineers, supply chain managers, construction professionals, and commissioning coordinators are all spending inordinate amounts of time searching for the information they need to do their work. This trend cannot continue without resultant negative impacts on project cost and schedule performance.

2.5 quintillion bytes of data are produced daily; while construction projects account for a percentage of that production, the total volume of data produced by a single project team seems unfathomable to most. And it's only growing. Most teams struggle not only to employ and utilize their data, but more simply to understand and find it all. Without knowing what is being produced, and where it's stored, making use of existing data appears an impossible task. But it's one worth undertaking. The rewards of doing so are not only financial; they’re competitive.

Dark Data

Of all of the data produced on construction projects, only 4% is ever used. 96% of construction project data isn't visible or isn't used by other project stakeholders. This invisible data is a key contributor to value loss, both qualitative and quantitative, on construction projects globally. The larger the project, the more pronounced the loss.

Dark data is data which is produced through business operations but which creates no decision-making or business value. Teams expend significant effort to capture and create this data, yet it's never operationalized. In many instances, other stakeholders aren't even aware it exists. It’s produced through the project lifecycle, by facility owners, design teams, construction teams, subcontractors, and commissioning specialists. At any stage in the project lifecycle, a data audit will reveal a multitude of dark data sources.

There are various shades of data darkness due to project delivery strategies and complex stakeholder structures. Many construction projects are global, yielding opportunity for dark data creation due to lack of cross-team collaboration. As teams work in separate locations, they store data on local devices and servers that in some cases are inaccessible to those in other locations. Therefore, some data may be completely dark, or accessible and understood only by the creator. Some data may be grey, accessible and understood by some stakeholders and not by others.

Dark data is more prevalent in construction than in other industries. According to Veritas, an average of 52% of an organization's data is dark, which is a stark difference from construction’s 96%. However, even 52% is substantial. For an organization that stores ~900TB of data, dark data will cost an average of $2.3M USD in storage per annum. Another $1.8M USD is spent on stale data storage, or storage of data that no one has used in 3+ years. Only 1.5% of data storage expenditures are attributed to data that drives business value.

Bad Data Confounder

Confounding the dark data problem is bad data, or data which is inaccurately captured, is inconsistent with other data (data quality issues), or is incomplete (missing fields). According to a recent publication by FMI and Autodesk, 30% of organizations report that more than half of their data is bad. Bad data is estimated to have carried a $1.8T USD cost to the global construction industry in 2020. According to the team, the total cost of bad data to a contractor generating $1B USD in annual revenue is $165M USD, with $7.1M USD in resultant avoidable rework.

Construction project teams are not only failing to employ their dark data, but they're making incorrect or inaccurate decisions with bad data. Engineering and construction firms too often lack strategies and tactics for managing data as a core resource which is required for project success. Without these strategies and tactics, data use is ad hoc, with countless bad decisions made resultant of simply not having access to a data source that could have ultimately informed the process or provided useful metrics.

Data as a Driver for Change

Construction projects move quickly, regardless of the project stage. Throughout the project lifecycle, from design through procurement, to construction and commissioning, decisions are made quickly and can have long-lasting impacts in downstream project phases. Decisions based on missing or incorrect data will too often miss the mark; the outcomes may be dire for high-stakes decisions that impact organizational trajectory.

While the quantifiable drawbacks of dark data, or bad data, prevalence are clear, there are a multitude of other industry drivers for firms to consider when it comes to data management. Cost and schedule overruns on construction projects may be partially attributed to dark data. In the past 3 years, only 31% of projects completed within 10% of the budgeted project cost. 77% of megaprojects are delivered 40% late. Cost and schedule overruns in the construction industry are not only common; they are anticipated. Better deployment and use of data and analytics tools can help teams combat project cost and schedule misses by fostering leading indicator interventions at key moments that positively impact project trajectories.

Workforce shortages continue to concern industry executives and project leaders alike. In a recent survey by Deloitte, 52% of executives in the engineering and construction industry stated that their firms are facing severe labor shortages. Substantial resource hours are lost searching for data and information. While resource shortages may not alleviate in the near term, better data management, and increased accessibility, will reduce resource hours needed for seemingly endless data searches necessary to deliver on project activities.

The Way Forward

The greatest obstacle that teams will face in the dark data journey is the seeming complexity of gaining control over a resource that has been uncontrolled for so long. However, the volume of data resources will only grow over time. It is imperative that teams begin or accelerate their journey now as waiting will only make the task more arduous in the future.

Industry firms will benefit substantially from developing and deploying a data strategy that supports the operationalization of dark data. Only 55% of industry firms have a formal data strategy established, which makes this the logical starting point for teams looking to generate more value from the data they're producing. Executive teams should prioritize the development and implementation of a formal data strategy within their corporate strategy programs.

Larger organizations face a greater uphill battle on the dark data journey than smaller ones. Firms that generated over $500M USD in revenue were more likely to cite a lack of leadership or organizational support as a reason for not having developed and deployed a data strategy. Boards and executives must not only prioritize data strategy development at an organizational level, but also ensure that support transcends all levels of the organization; managers and front-line project leaders must have executive and senior management support as they deploy digital project strategies that support organizational objectives.

Firms that seek to operationalize their dark data must invest in inventorying and documenting their existing data sets. Depending on the size of the organization, this may be a substantial task. But, without an inventory, teams will have no way of identifying sources of dark data and addressing them. Processes must be established for future data generation to ensure that data sets are identified, documented, produced, and managed in a way that enables access, validation, and management efficiency.

Data accessibly is the enabler for effective data management. Firms must evaluate, procure or develop, and deploy solutions that facilitate data capture, validation, storage, access, and analytics to make more effective use of the plethora of data. Gone are the days of local spreadsheets and cloud folders with heavily restricted access rights. While some data is sensitive and must be protected, most data must be shared and made readily available for it to create value. Firms must establish solutions and architecture that empowers access to data, not overly restricts it.

Data accessibility may be partly facilitated through the development and deployment of analytics programs that foster improved decision making. Descriptive analytics programs combine multiple sources of data in visual data models to generate insights and empower good decisions. More advanced teams are leveraging predictive analytics to derive insight into what may happen and take anticipatory action to course correct or capitalize on opportunities. Modern project teams are reviewing dashboards and reports with near real-time data to identify trends, spot issues, and mitigate risks before they negatively impact a project.

Data literacy continues to lag in the engineering and construction industry. Many stakeholders are comfortable discussing data when it pertains to spreadsheets and documents, but quickly become uncomfortable at the mention of big data, data lakes, or predictive analytics. Formal data training is fundamentally lacking. As data literacy increases, the support for data initiatives will increase. Those organizations that invest in data literacy programs will derive competitive gains over others who lag on what are sure to become core competencies for engineering and construction firms of the future.

Asking the Right Questions

Devising a data strategy and deploying a successful data program requires that the resultant insights and analytics help project teams perform better. Too often, leadership teams embark on a data journey, spending countless hours and funds on collecting, validating, and storing data that creates little or no business value. If teams don’t take the time to have deep, exploratory conversations with their project teams, and determine the data-based questions that those teams need to answer, any program deployed will likely miss the mark.

To be successful on the data journey, data must answer critical questions for those executing the work. Charts and graphs serve no purpose if they fail to capture and report on metrics that will empower project teams to identify risks, answer questions, and make good decisions. Leadership teams should engage with key team members to identify sources of truth, capture metric requirements, and identify commonalities across projects which may combine to form a holistic set of organizational data standards.

Talk to clients to determine their data requirements and discover the insights they find valuable. Ask your engineering teams how data can be better produced, captured, stored, and accessed to improve productivity and workflow. Ask front-line supervisors how data capture and access can improve task safety and productivity. Invest the time in determining where your real data ROI is; the business benefit of safe work execution on-site far exceeds the ability to pull up a chart and visualize historical performance which can’t be influenced.


Dark data, and bad data, are contributing to value loss on construction projects; but, with executive and leadership support this loss can be pivoted into value gains through the implementation of data management strategies, the operationalization of dark data, and the deployment of analytics programs. Data is a resource that is already harvested and produced on every construction project currently underway. Engineering and construction firms must now find ways to document, structure, and operationalize this value-laden resource to derive benefit from it. While there will be investment required in deploying data programs, digital solutions, and training, recouping the hundreds of millions currently being spent on dark and bad data is well worth the investment.


Recent Posts

See All


bottom of page