Cybersecurity Has a Data Quality Issue
Which is why there are so many 'lemonade makers'
An episode of the Cloud Security Podcast caught my eye, as it was an interview with Edward Wu, founder and CEO of Dropzone. Dropzone is focused on SOC automation. I interviewed Edward on my podcast in 2024, so I was curious to hear an update on the market from him, as AI has been moving fast. I’m not sure we were even saying “agentic”, and MCP didn’t exist at this time.
I highly recommend watching the full Cloud Security Podcast episode. Edward Wu always comes across as honest and speaks without hyperbole. I get the sense that, even as CEO, he still has an engineering role within his startup, or at least, he remains very close to the tech development. Ashish Rajan asks some excellent questions, prompting Wu on exactly the specifics I was hoping to hear more about.
There’s a lot of discussion on the parts of SecOps you can’t use AI to automate or solve. Also discussed are the bits that prevent AI from being successful no matter how intelligent it is, like institutional knowledge that isn’t documented anywhere.
Watching the episode, I’m reminded of how much of the funding in the cybersecurity industry is going to the lemonade makers. If you haven’t read my essay, A Market for Lemonade, the TL;DR is that a lot of cybersecurity vendors (the lemonade makers) exist to solve problems created by other cybersecurity vendors (the lemons). It’s worth exploring why this is the case.
Why everyone wants to make lemonade
Simply put, it’s easy to build a product that analyzes security data you already have. You can’t find threats in data you don’t have, however. Too often, we miss the fundamental questions: is this the right data? Is the data correct? Is the data complete?
It’s hard to build a library of 200,000+ vulnerability checks, so startups in the vulnerability/exposure management space are almost exclusively lemonade makers (RBVM, UVM, CTEM) - at least, where we’re talking about infrastructure scanning (i.e. vulns linked to CVEs). The only innovator in the vulnerability scanning space in the past 20 years was a small startup out of Montreal called Delve Labs. It was acquired by Secureworks (now Sophos) and renamed Taegis VDR.
The challenges don’t stop with building the vulnerability checks. The buyer has a lot of responsibility here that can impact the quality and completeness of data. Practitioners have to configure the product effectively (configuring a vuln scanner is easy to mess up). They have to input the correct lists of assets for the scans. They have to connect it to the right accounts.
A short anecdote might help to put this issue in focus. Many years ago, a friend and I founded a security consulting firm. One of our main products was helping to build security processes, which included checking the configuration of security products.
They were scanning all 14 of their websites for security issues. However, they had somehow misspelled 13 of the 14 websites, leaving the ‘m’ off .com for 13 of them (no .co version of these websites existed. Since one of the domain names was correctly spelled, it was getting scanned.
Since they were receiving results, they assumed everything was fine. They weren’t aware that all these results were from one website. There was a huge data gap they weren’t aware of. The product wasn’t designed to tell them, “hey - 13 of these websites you’re scanning have invalid domain names, you should probably fix that.”
This is a reminder that the idea of build vs buy is a false choice. It would be more accurate to describe the choices as build alone versus build with others. There are few, if any, cybersecurity products on the market that don’t require the buyer to do significant work before the product can be useful. I call this the customization tax. This isn’t the vendor’s fault - every enterprise is different. Vendors can only do so much when building a product for a broad market.
The vendor has a lot of responsibility as well. The big three vulnerability scanners on the market don’t do a great job of correctly identifying IoT/OT devices. Scan a Ubiquiti device and they’re baffled - they’ll tell you it’s a Linux server running an end-of-life version of Debian. So, of course, there are vendors that specialize in only scanning IoT devices. You could even buy several complementary scanners and still have enormous gaps in your data.
In SecOps, detection engineering is the data challenge. Do we build broad or narrow detections? Are we getting all the necessary data to build the detections? Are there delays and bottlenecks in data collection and querying?
In third party risk management, you’ll never have time to perform deep due diligence and monitoring on all your third parties. Which vendors represent the biggest risks? Are you asking the right questions on your questionnaires? Are the responses accurate and trustworthy?
Everyone wants to make lemonade, because building sensors and gathering data is hard. Many buyers love making lemonade, because they start off with a mess of data and end with a nice dashboard with scores, prioritization, and metrics. When buyers see a vendor turn a million critical vulnerabilities into a ‘top 10 patch ASAP’ list, it feels like progress. Lemonade aims to be tasty, not healthy.
Making Lemonade Doesn’t Address Root Problems
Garbage in, garbage out. It’s a common phrase, but the challenge in cybersecurity is that we don’t have enough folks skilled in determining the quality of our data. Vendors and their data scientists get excited about markets where there’s a lot of data, because they don’t have to go out and create the data. It’s already there and ready to be analyzed, sorted, normalized, reduced, and summarized.
An important point: vendors’ products don’t become lemonade makers until the buyer feeds them lemons. It is largely on the buyer to ensure they’re not feeding bad data into the hopper. For example:
What if the customer fat-fingered one of their IP ranges? Instead of scanning 10.1.2.0/24, they’re scanning 100.1.2.0/24.
Perhaps there is also an external class C network the security team is unaware of, so it has never been scanned from the outside.
Security Rating Services only see a company’s external infrastructure, and often get companies’ assets confused and mixed up.
If you don’t pay for Salesforce Shield (reportedly 30% of your total Salesforce spend, ouch) and lack logs, you can’t build Salesforce-related detections.
If the data is wrong or missing, there’s no way to magic a win out of it with AI or any other technology.
This is why edge devices get hacked, despite fixes being available for months or years before the attack. Perhaps they weren’t getting scanned. You can’t protect the assets you don’t know about.
This is why attackers are able to drop small Linux VMs on servers and desktops as a base of operations. Detections aren’t looking for WSL.exe in process lists, or new VMDKs showing up in %APPDATA%.
This is why the company that gets breached always has an A+ on some security rating service’s scorecard, and the ones that don’t get breached often have D’s or C’s. The rating services don’t have enough data to make an accurate call, but as long as some data exists, they’ll make lemonade.
Check Your Ingredients
To avoid making lemonade, you’ve got to check the quality of the data you’re giving to these ‘overlay’ cybersecurity products. If you’re planning to buy a product, and step 1 is to ingest data that another one of your products collected, stop and consider:
how comprehensive is this data?
is the collection of this data taking into account current attack scenarios and TTPs?
how accurate is this data (e.g. what are false positive rates, do your analysts trust it?)
how would I know what the quality of this data is (i.e. do I need to hire a third party expert to tell me?)
We also have to be careful with metrics. The lemons and lemonade makers in the market are great at generating empty calories - metrics that look and feel like progress, but have no impact on your security program’s desired outcomes. You patched 100,000 vulnerabilities, congrats! Did any of these vulns represent any real risk to the business? Statistically, the answer is probably not. It feels great when that vuln count line goes down though. Lemonade is delicious.
But is your goal to satisfy a sweet tooth, or to get healthy?
You can’t magic good outcomes from bad data.
Conclusion
Dropzone and others are building some impressive scaffolding for automating mundane, repetitive SecOps tasks, but there is a question we have to ask before paying $9 an alert for agentic automation: “am I feeding it trash?”
Defenders need to give more attention to the market niches that focus on validating the quality of our security data. Products and services that help connect theory and best practice to reality.
Only when we’re sure of the quality of our controls and data can we get value out of our products, not lemonade.




