How to properly use data in your organization

Bowen
16 min readNov 27, 2021

--

As someone who has worked with data both academically & professionally, I have spent a lot of time thinking about data, its value, and its role within organizations.

I was a Teaching Assistant for 2U / Triligoy Educations Full Stack Development Bootcamp at UNC-CH (a top public university in North Carolina). I was fresh to engineering, web development &, to be honest, was fortunate enough to have talent and substantial interest, which allowed me to be a part of this role. I completed my second 6-month section on the academic team when I learned UNC-CH wanted to add a Data Analytics & Visualization Bootcamp to their catalog.

I was utterly unqualified; however, I proceeded to be a teaching assistant for the data program because of my communication ability and quick learning. I excelled at assisting preprofessionals in advancing their careers. My first thought was, “cool, another challenge that I will rise to the occasion of.” My second thought was, “wait, the best application of data in my life is the Weather Channel, and that’s a bunch of hoo-has.” So, like an eager world overcomer, I proceeded to go through the course on my own, teaching myself Python (& an array of other skills) and buckled up. It was gratifying and also incredibly scary at times. I’m proud to say our academic team ended up with a 91 Net Promoter Score (NPS) which is exceptionally high (for context, the academic units that I was on for the Full Stack Development Bootcamp were in the 68–74 range, which was good).

After successfully conquering teaching something new to me, I left my previous company (@Genesys). I started a new role (@Epic Games) that focused on engineering a suite of analytics tools + big data wrangling. While I did learn the technical language of data by jumping deep in my positions, this article won’t be addressing that side. This article draws from my critical thinking, outlooks, research, and continued education (CORe @HBS Online, Disruptive Strategy @HBS Online, Business Analytics @MIT Sloan Online).

The following will address the value of integrating data within a corporation, strategies to integrate data, Key Performance Indicators (KPIs) that matter within organizations, pitfalls, and examples. Along the way, we will follow the story of Apple.

Please reference the bolded word in the definitions at the bottom.

What makes great products?

I am not going to try to create a summary of why Apple is where it is today. We all know that in some variation, the relentless combination of Steve Jobs (the creative, visionary, disruptor) and Steve Wozniak (the technical, inventive, disruptor) was what launched Apple into the stratosphere. Yes, it was visionaries, not data, that were responsible for making Apple a household brand. However, data is now a huge part of Apple’s culture. The critical difference is that data enhances excellent products; data does not make great products. This exact difference is what makes Apple a fantastic palette for this conversation.

Figuring out a successful, impactful product is a “human’s job.” I often reference the idea that for every market problem, there are 10,000 market solutions. I find this imbalance of problem-to-solution interesting because only a couple out of the corpus of market solutions get the job done for the consumer. Almost always, the products and services that accomplish a market impact originate from a “human,” not a data set.

However, once a great product or service is introduced, data can optimize its growth, impact & overall success. In January 2001, Apple launched iTunes music, disrupting the music industry with the first-ever platform to buy, store, & listen to music via the ethernet. While Napster was first in online music, it did not reach adoption like iTunes because it lacked the consumer base of Apple and the simplicity & fluidity of UX. There was no grouping of analysts and data engineers behind the release. However, Apple did use data to improve it.

In 2008 Apple released iTunes Genius, which collected data on all of their listeners and cross-referenced it to create informed “suggested” playlists and general music suggestions to their consumers. iTunes Genius was a great use of data and utilized prescriptive analytics. Yes, Apple has been eclipsed by Spotify in terms of recommendations & the disruption of streaming subscription versus ownership. However, iTunes Genius still remains an original example of how data can make a great product better.

For context on how iTunes Genius fundamentally worked: Apple looks at all of its users’ “music libraries,” they break down these libraries into categories like genre, artists, and songs. Then they cast archetypes; country music fans, rock music fans, mixed fans, etcetera. From there, they cross-reference differences in the libraries from the archetype level. For example, 10% of users are country music fans; of that 10% of users that are country music fans, 90% have Garth Brooks album Fresh Horses. Of that 90% that own Fresh Horses, 50% also own music by Clay Aiken & Dolly Parton. For every country music fan with Garth Brooks album Fresh Horses that does not have music by Clay Aiken or Dolly Parton, iTunes Genius should recommend Clay Aiken & Dolly Parton.

The critical difference is that data enhances excellent products; data does not make great products.

What kind of data should every corporation use?

There is a buzz about “Deep Learning,” “AI,” “Machine Learning,” “Quantum Computing,” etcetera. There are even a few startups that have received Venture Captial funding solely for preaching the buzzwords. I assure you that your company does not need this type of data to survive.

The most fundamental data a company should focus on cash flow, burn rate, consumer adoption, and general accounting.

You should know every expense, liability, & asset you have.

You should know how much customer acquisition is (this counts for B2B businesses as well).

It would be best to have grounded ideas on corporate valuation, funding, and investment outlook.

You should know consumer sentiment and employee sentiment.

(If applies) You should have excellent customer service & should leave clear outlets for consumers to communicate their needs

Done! A company shouldn’t need anything more than that to survive. Hard data, that is universally understood is enough!

Where should data live in an organization?

For small organizations, there should just be a couple of people responsible for data that live within the ecosystem. An accountant, a marketing analyst, maybe a data engineer. However, for large companies, no matter the existing structure, data should have a level of separation.

Following this definition provided by Indeed,

A business unit is a separate division within a company that often develops and implements its own processes independently from the core business or brand while still adhering to the overall company policies. Typically, large brands adopt this kind of structure to better organize and track metrics like revenue or costs for each division. Having a structured business unit allows each unit to manage its own profits and costs, which can help companies monitor and reduce their overall costs associated with various department functions.

we can apply the business unit theory to Apple and how they place a “data” unit within their organization.

Looking at the diagram to the left, we could imagine that units A-C are Hardware, Software, & Entertainment within Apple’s ecosystem. The “data” unit lives to serve all three, as a depot of à la carte information.

The goal of separating the “data” unit and making it available to all of the separate “business units,” leaves an ideal scenario for increasing efficiency and performance of using data in a healthy way through an organization.

However, this layout CAN crumbles to the pitfalls of organizational outline in the following section.

Data pitfalls within corporations and how to fix them

There are three major pitfalls of data management within most corporations today.

Untamed growth of data

Repetition of ETL work due to lack of documentation

Poor communication between business units

Misinformation caused by improper standards

The truth about data is that it is messy. Whether the collective nature of the data you use is deep, broad, or both: organization is critical. If you use more than 10 data sources or more than 100 data structures, stop and standardize. One of the key attributes of “data’s nature” is its ability to grow. It would be best if you decided whether you want a well-kept lake or a sewer plant.

Use standardized naming conventions

Thoughtfully consider how to categorize different types of data

Put deep thought into what platform your data engineers will use

Put deep thought into what platform your analysts will use

Create a framework for documenting data (passed Jira & Confluence)

Untamed Growth causes every contributor to the data pipeline to become overwhelmed. More data means more problems. You generally want to be particular about what data you want. You want to be constantly cutting down unneeded or unused data. Every new data source, internal or external, should go through a thorough audit before implementation. Why do we need this data? Is there another data source that we already have that addresses the need for this data? What is the net impact of collecting this data compared to other sources of potential new data?

Repetition often occurs when various stakeholders request the same comp or data set within an organization. In this situation, you could end up with the same dataset undergoing the same ETL and analysis, causing gross corporate inefficiency. Repetition can be avoided by keeping a ledger of what data has been requested, who requested the data, the current lifecycle of the data, a general summary, a detailed approach of what was done, and the location of the data. No, Jira is not good enough, neither is confluence. There should always be one true source for any requested data source or comp that a stakeholder can visit before causing a repetitious request to data teams. The data teams should also reference this “universal source of truth” to cross-reference and verify that the workflow has not already been done. Moreover, in a hypothetical where the data in question is outdated, the ledger can inform how to recreate the newer version, reducing overhead. It seems like an obvious way to save time and money; however, I have never seen anything like it.

Poor Communication is generally caused because people who work with data are not social animals. They think differently than managers and don’t always understand what they are supposed to do because they don’t understand their manager’s needs. What often happens is the analysts or engineer gets a general idea of what needs to be done, they do their best to achieve a valuable product, then the stakeholder settles for less than great comps because that’s what they’ve got. Perfect is not the goal, better is; stakeholders need to express what data they want, why they want it, and how they want it to look. This means they need to be trained on how to communicate with data engineers and analysts. On the other side of the coin, data engineers and analysts also need to communicate with stakeholders. It sounds like an expensive project; however, how many of you readers have wasted an extreme amount of time because of miscommunication? A second alternative is to hire people to act as bridges between stakeholders & engineers/analysts; not a project manager, but a “communicator.” Although with the second solution, there is a risk of translation errors.

An interesting litmus test a stakeholder can execute is to ask the analyst or engineer to recite back (without interruption) what the stakeholder wants. If the recital is out of alignment, at least one of the two following this must be true.

  1. The stakeholder poorly communicated what they wanted
  2. The analyst or engineer did not understand what the stakeholder wanted

Examples of smart data

User interfaces are troves of data. Apple’s iPhone is an excellent example of micro-optimization in user experience through the collection and analysis of hard data. Apple has the capability and likely implements a variety of approaches in collecting user data related to the interface of the iPhone. For example, Apple might want to collect the time to complete, the number of steps, and the heat map of selecting and reconnecting to a network after a network disconnects. Apple will also collect this type of data from a variety of user situations. Crossreferencing navigation data from a YouTube video versus an online recipe versus a messaging thread. They will then replicate this process to various “user experience” tasks. Using descriptive analytics, they will calculate things like average time, step complexity, correlations in navigation data. They will also include other forms of hard data like the frequency in which the users need to perform these actions and qualitative soft data on whether this should be a focus of micro-optimization. From there, if selected for optimization, they will likely A/B test a new feature and run a scientific experiment. Does the new UI feature improve the user experience? Does the new UI feature negatively impact other elements in the interface? Is there a learning curve to the new UI feature? Is this new feature a software patch or upgrade?

Alternatively, another excellent example, however holistically different, is with the Apple MacBook Pro. In the mid-2010s Apple made various changes to the laptop lineup, notably (a) removing ports like HDMI, SD Card, & MagSafe (b) changing the click mechanism in their keyboards. Instead of locking down metrics about whether a dongle was used to connect an HDMI cable or the changes in typing speed or usage of backspace (which would have likely been fruitless stats). They put their ear to the ground for user feedback. They likely collected data from Twitter where the product was mentioned, separated the tweets by sentiment, then did a qualitative analysis on what negative items were being mentioned and what positive items. They also likely reach out to influencers, super users, and partnered buyers to ask for their feedback, as the most frequent users are usually the first to point out errors and difficulties. It worked, ports & MagSafe are back, and the user base is arguably happier with the product line.

A study conducted by Clyde Christensen reminds me of the “listening to customers example of Apple.” Here is an excerpt from his website,

“A fast food chain interested in improving milkshake sales spent months doing market research, peppering customers with questions about their milkshakes. Was it chocolatey enough? Thick enough? Did it contain the right amount of syrup? But this gave no new insights.

They then brought in two consultants to examine the problem, who were surprised to find that quite a few milkshakes were being sold in the morning. After conducting in-depth interviews, the team discovered that customers were buying milkshakes for breakfast during their morning commute. Instead of caring about thickness or flavor, customers were actually drawn to the fact that it was relatively tidy and could stave off hunger until lunch. In this instance, the competitor wasn’t other milkshakes, but easily consumable breakfast foods like bagels or bananas, giving the chain an entirely new perspective on ways to compete.”

The hard and soft data didn’t matter, sometimes you just need to ask your consumer.

A predictive analytics example for Apple can be a theoretical analysis of its supply chain. Apple has years of historical revenue data and the number of units sold on a variety of products. For new products, they have items like the iPhone, iPod Shuffle, AirPods, & Apple TV; for existing products, they have data on the iPhone 1–13, AirPods 1–3, etcetera. They also have additive data like inflation, price fluctuation, household income, consumer engagement, etcetera. They can use features from that data as inputs to a predictive model that will help them get a rough estimate of how many units they should put into production, as roughly depicted below.

A prescriptive analytics example was already mentioned with Apple Genius, however, another example can be made with how Apple might handle its cloud infrastructure. Whether they use their own servers, AWS, or Google Cloud, with a company of that scale, they definitely are using cloud metrics to understand which servers are under load or underutilized. They can automate the direction of traffic from one server to another, automatically with prescriptive analytics. This saves money & reduces risk of outages (also saving time and money)

Data collection & presentation standards

Another thing about data is the culture in which it is being used. It has the nature of a double-edged sword.

In an outrageous hypothetical, let say that Apple uses a time-series forecast of the future consumer sentiment to their Board of Investors to show faith in the quality of the Apple brand over time. A senior exec and presenter with bias personally believes that the Apple brand is invincible and uses the time series forecast data to solidify his opinion. The investors are happy.

A month later, Apple stock crashes because of internal ethics issues. Data sets coming out of 3rd party outfits forecasting consumer sentiments of Apple (built off a representative sample) showed this was going to happen. The investors are mad because these “uncovered” predictions were in direct opposition with what the senior executive had presented to them a month earlier.

The senior executive, probed to answer for the complete disillusion of data, yelled down the latter and soon finds out the time-series forecast WAS a representative sample, however it WAS a representative sample of current Apple Employees!

Now, that is a ridiculous story, however, it helps illustrate the point, that no matter how much a person or entity might agree with the data (or disagree), it is ALWAYS important to qualify the data.

Where did the data come from?

What is the overall quality of the data source?

What are the pitfalls of the data source?

If modeling was used, how does the model work?

What is the implicit bias of the data outcome?

The data engineer, the data analyst, and the presenter should not only be able to produce answers to these questions, but they should also be qualifying any decisions made off that data with the answers to those questions. (Also, there are more questions to ask than the ones provided above). Not only will asking and answering these questions improve the overall quality of the data, it will improve the confidence in the data, it will also improve the confidence in the decision, or better yet occasionally prevent a bad decision from being made.

If I could wish one thing for organizational use of data, what would it be?

To be honest it’s pretty simple and can be narrowed down to one specific thing.

Organizations should hire a consumer psychologist & actually do research! It was alluded to in the example of Clyde Christensen, however, it is my firm belief that the easiest way to find out what a consumer wants or how they feel is just to ask them. It seems silly, however, few modern companies care to or even think of doing this. More of it should be done.

Final Thoughts.

At the beginning of this post, I talked about how I was not optimistic about data and mentioned the Weather Channel as “a bunch of hoo-has.” It’s funny having worked with, taught, and studied big data for over 4 years now, I have a new opinion about the Weather Channel.

  1. They qualify the data; if they are not sure it’s going to rain they say “40% chance of rain”
  2. They are constantly improving their data collection; expanding the network of weather sensors, expanding on their knowledge
  3. They use data from real people in real-time
  4. They don’t rely on one data set or model as the universal source of truth; they use data orchestras and communicate with differing weather centers.
  5. They are constantly looking for new and improved ways to forecast
  6. Also, I don’t know if you have ever watched the Weather Channel during a Hurricane, but boy do they love what they do.

It’s funny, they are often the subject of jokes for not ever being 100% sure, however, I think the job they do with data is perfect. They’re constantly improving, and let me know if I need an umbrella!

Corporations should be using the data the same way the Weather Channel does.

Definitions

Hard Data — Universally true data. For example, Accounting is “hard data,” A company made X revenue with Y profit because of Z expenses.

Soft Data — Data that contains subjectivity. For example, the Bloomberg terminals create forecasting data to inform investors, new reports, and banks a peek into the future (often done with Time-Series forecasting, which relies heavily on seasonality)

Descriptive Analytics — Analytics reading data points, correlating them, or otherwise shaping them into interpretable compositions. An example of descriptive analytics could be a breakdown of the minimum, median, maximum, average height of all Football players in the 2021 season of the NFL.

Predictive Analytics — Analytics that are the outcome of some feature set, algorithmic computation, or modeling. For example, health insurance companies assign rates to individuals based on their predicted health risks, modeled by the outcomes of previous or existing insurance holders.

Prescriptive Analytics — Analytics that guide in real-time or close to real-time and provide insight on the best course of action. An example of prescriptive analytics (real-time) environment would be auto-landing on a Boeing Airplane. Alternatively, prescriptive analytics (non-real-time) would be logistics engineers optimizing traffic lights.

--

--