If you’re reading this, you’re likely weighing whether to build your own data pipeline or buy a solution like Adverity. And if a sales rep sent this your way, they’re probably hoping to steer you toward the latter.
Naturally, we’d love for you to use our product—it’s what we do. But the reality is that building your own solution comes with serious challenges and costs, some less obvious than others.
We won’t pretend it’s impossible. Some companies successfully build in-house systems that meet their needs and even save money. But for many, it’s more complex and costly than expected.
With that in mind, this guide isn’t just here to sell you on our product. It’s a candid look at what building in-house solution really involves—especially the hidden challenges and expenses. If you do go that route, we want you to do so with open eyes.
After all, this is what we do every day. We understand the effort, the pitfalls, and what it really takes—and we’re proud of the work we put in.
And, if you do decide to build your own solution, we genuinely wish you the best. It’s a tough road, but it can be a rewarding one (we know—we’ve been there!)
We’ll be the first to admit that our methodology isn’t 100% scientific. That’s because, when it comes to building an in-house solution, the answer to most questions is: "it depends."
It depends on how many data sources you have. It depends on what they are. It depends on how much data you need to fetch and how often. It depends on the environment in which you’re using the data, the security and access measures required, and the expertise of your team. Ultimately, it depends on the full scope of what you’re trying to build.
Where possible, we’ve included estimated time and cost figures. However, this isn’t a one-size-fits-all assessment. The numbers will vary depending on your specific needs and resources.
Before we start, it's crucial to understand the difference between two fundamental concepts: APIs and data pipelines.
An Application Programming Interface (API) is a set of rules and protocols that allows one software application to interact with another. APIs define the methods and data formats that applications can use to communicate, enabling seamless data exchange and functionality sharing. For instance, a weather application on your smartphone uses APIs to retrieve data from remote servers, displaying up-to-date weather information.
However, while APIs facilitate access to data, they don't inherently manage the data's journey from source to destination. This is where data pipelines come into play.
A data pipeline is a series of processes that systematically move, transform, and manage data from one system to another. It encompasses the entire data flow, from collection and ingestion through processing and storage, ensuring that data is accessible, reliable, and ready for analysis. Unlike a simple API connection, a data pipeline handles complex tasks such as authorizations, data validation, enrichment, and aggregation.
So, when we talk about a data pipeline, this is what we mean.
Setting up an API connection isn’t particularly difficult. There are plenty of open-source solutions available, and even with some customization, an experienced engineer can typically set one up in a few days.
However, that’s just the start. The real challenges (and costs) come once you have that connection in place and it’s time to build your data pipeline. Why? Because, after the initial API connection, you need to address a range of critical questions:
1. Authorization & Access
How will you authorize connections? Most, if not all, data sources require some sort of authorization in order to access the data in the first place. While there are many different systems, about 80% of data sources use token-based authentication.
If you don’t automate authorization, you’ll spend a huge amount of time manually requesting tokens and managing access. Depending on how much data you need to access and how often, having a system to automatically access your data is critical.
At Adverity, we have a whole bunch of features to make this simpler for our users, and you can read all about the benefits of centralized authorization here.
2. API Request Limits
Many APIs impose limits on the number of data requests you can make per day or even per minute. These can get quite complicated depending on what access level or account you hold, and generally, each platform measures limits in different ways. However, what this means is that there is a limit on how much data you can fetch from any given platform, and if you exceed these limits, your data pipeline could fail or require workarounds.
Importantly, these limits also can, and do, change, such as in 2023 when changes to GA4’s API limits caused a lot of problems for data engineers across the industry as they sought to find workarounds (you can read more about this here).
Again, at Adverity, we have a number of different techniques and strategies for solving this challenge, so that our customers never have to think, let alone worry, about rate limits. These include a lot of complicated-sounding terms such as intelligent throttling management and data chunking. If you are that way inclined, you can learn more about them here.
Figure 1 Examples of rate limits and quotas for popular data sources
3. API Rules & Restrictions
Each API comes with its own set of rules and restrictions. Some may limit the type of data you can access or have strict compliance requirements. All these rules and restrictions add additional layers of complexity that need to be tackled to successfully and regularly extract your data.
4. Data Security
What security measures need to be in place? Who will be able to access the data? If you’re handling PII (Personally Identifiable Information), compliance with regulations like GDPR and CCPA is critical, and failure to comply could result in hefty fines. In most cases, simply extracting data to a spreadsheet will not be enough to ensure compliance, so you will have to consider how and where your data is stored and how it is accessed.
5. Data Storage
Where will the data be stored? There are a number of options for storing your data. You could choose data lake solutions such as Amazon S3 or Databricks or database solutions such as Snowflake or Google BigQuery. However, working with these destinations requires additional work on connecting your data source API to the destination API. Whatever solution you choose, you will need to determine how your data interacts with that chosen environment.
6. Data Mapping & Integration
If you’re collecting data from multiple data sources (and let’s face it, you probably are), then it needs to be correctly mapped and integrated before you can use it; otherwise, you’ll be attempting to compare apples to oranges. Now, this process can be done manually, but even with a small amount of data, the process is time-consuming and prone to human error. If dealing with any substantial amount of data, the task becomes almost impossible, making automated mapping and integration an essential component of any modern marketing data pipeline. It’s not just about simply fetching the data - it’s about making it usable.
This is just the initial setup. It doesn’t necessarily include all the things a data pipeline can do, but what, at its most basic, a data pipeline should do. And just for one data pipeline. This process needs to be repeated for every single data source you want to connect to.
Once your system is set up, you’ll need to maintain it. This requires ongoing resources and costs. In fact, this constitutes the biggest long-term financial commitment when building your own solution and the one that is often overlooked.
1. API Updates
APIs are frequently updated, which means that every API you work with will change over time. If you don’t update your data pipeline accordingly, it could, and most likely will, break, leaving you with inaccurate or missing data. Of course, not all APIs are created equal and are updated with varying degrees of regularity and scale. Some APIs are updated very frequently, while others are less than once a year. Some API updates are minor and innocuous, while others tend to be large-scale.
From a cost point of view, this comes back to one of those ‘it depends’ questions regarding which data sources you want to use. Either way, it’s essential to have a team in place monitoring API updates and updating your data pipelines accordingly.
In fact, as a business, API maintenance constitutes quite a large part of what we do, and we spare no resources to ensure that we know when every update will occur, what it consists of, and that all our connectors are updated. Unexciting? Possibly. Essential? Absolutely.
Figure 2 Examples of API update schedules for popular data sources
2. Security & Compliance
Unsurprisingly, security and compliance are not issues that go away after the initial setup; they are factors that your team needs to be constantly vigilant about. Not least because APIs are a common target for cyberattacks. So, who is going to be responsible for securing your API connections, and how will your system handle DNS attacks, data breaches, or other security threats? Again, the time and resources spent on this will multiply with every data source and pipeline you need to maintain.
3. Ongoing data monitoring
Maintaining a data pipeline isn’t a "set it and forget it" job. You’ll need someone (or a team) to manage authorizations, integration issues, and troubleshoot errors if and when they arise. At the same time, data needs to be constantly monitored for accuracy, duplication, and general data quality to ensure your data is fit for purpose.
As already mentioned, with so many ‘it depends,’ the initial and ongoing costs of your in-house build are going to vary drastically from company to company.
Nonetheless, you’ll need some specific skills and experiences. You'll need someone who understands operating systems (like Linux), system security, and interfacing; someone who knows how APIs work and can maintain them; someone with web development skills to integrate the data; and someone with the data knowledge to transform raw files into meaningful, usable insights.
For simplicity’s sake, we’ve identified the following roles we think are essential to an average-sized build, along with what skills they need and their average compensation (base salary + benefits) in the USA.
Average salary: $150,000 (total comp - salary + benefits)
Skills and responsibilities: Designs, builds, and maintains scalable data pipelines, architectures, and systems to enable efficient data processing and analysis.
Average salary: $166,000 (total comp - salary + benefits)
Skills and responsibilities: Designs, codes, tests, and maintains software applications to meet user and business needs.
Average salary: $138,000 (total comp - salary + benefits)
Skills and responsibilities: Designing system architecture, troubleshooting issues, performing upgrades, managing back ups, capacity planning, and ensuring system standards are met.
While bearing in mind all the ‘it depends’ questions, below we’ve provided rough estimates on the average time it takes to set up and maintain 10 data pipelines, based on our own experience. Why 10? Well, according to our research, 99% of marketing teams utilize at least 10 data sources.
Naturally, these estimates will vary from company to company, so we provide them only as a rough guide.
Figure 3 Initial set-up estimated costs for 10 data sources
Figure 4 Estimated annual ongoing maintenance costs for 10 data sources
This brings us to a conservative estimate of roughly $26k for initial set-up and then an ongoing annual maintenance cost of around $90k.
In reality, this cost will be higher or lower depending on your company, your needs, your current setup, and how many and which data sources you want to work with.
To this end, we’ve developed a calculator to help you get a more accurate estimate of what an internal build would cost your business. You can also adjust the expected salaries for each role, as we are aware that in different regions, these costs will vary.
Cost is one thing, but the time to completion is also crucial. Any downtime as you wait for a solution to become operational will impact your business and raise the question of what solution you will use in the interim.
To some extent, this can be mitigated by simply hiring or contracting more resources to expedite the process. However, of course, this needs to be weighed against the additional costs of sourcing and managing those resources.
That said, with a team consisting of a Data Engineer, a Software Developer, and a System Engineer, we estimate an average-sized solution comprised of 10 data pipelines would take a minimum of 8 weeks or about 2 months.
Importantly, if you want to scale by adding new data sources, then costs and resource allocation will need to grow accordingly.
Yes. While we have a vested interest in showing a high cost for an internal build, this is roughly what it costs. And, a significant amount of this cost goes towards the ongoing maintenance of your solution.
This is why we have a whole army of engineers and developers whose sole job is to manage and maintain our APIs. And we need to because we have more than 600 API connectors!
The above estimates are based on what we consider the core costs involved in an in-house solution. However, there are a number of other factors and potential risks that, while less quantifiable in terms of actual cost, should nonetheless be considered by any business embarking on this type of project.
1. Administrative costs
Building an in-house solution requires an in-house team. Finding the right people takes time and resources, whether full-time, part-time, or contractors. Even if you already have the right people in-house, embarking on this project will take them away from other projects. Moreover, ongoing maintenance tasks will always require a dedicated team.
Once you’ve built an in-house team, someone also needs to manage them, track progress, coordinate workflows, ensure deliverables are met, and conduct performance reviews, and all these things take time away from other work.
Employees may require benefits like pensions, insurance, and training, plus there are also ongoing HR costs for payroll, compliance, and contract management.
These administrative overheads may seem insignificant; however they can add up quickly, making an in-house solution far more expensive than it might initially seem.
2. Staff access
How will users access and interact with the data? Data democratization - being able to share the right data, to the right people, at the right time - is a crucial part of any data set-up. One solution is to build your own platform to allow marketers and other teams to more easily retrieve and analyze data. However, without a well-designed UI, users may struggle to navigate the system, increasing reliance on engineers and limiting data accessibility.
Beyond development, someone needs to manage this platform—handling updates, troubleshooting issues, and ensuring smooth operations. Then there’s training. Staff need to understand how to use the system effectively, and that means ongoing documentation, user guides, and support. Technical documentation isn’t a one-and-done task; it requires continuous updates and dedicated resources to maintain, adding yet another layer of cost and complexity.
3. Scalability
How will your in-house solution grow with your business? Without proper scalability, what works today may become slow, inefficient, or even obsolete as data volumes increase. More users, more data sources, and evolving business needs can strain your infrastructure, leading to performance issues and costly rework.
Managing scalability isn’t just about adding more storage—it requires planning for increased processing power, optimizing data pipelines, and ensuring systems can handle growing demand. Without a clear strategy, your in-house build could quickly become a bottleneck rather than a solution, requiring ongoing investment just to keep up.
4. Staff churn
What happens if the people who built and maintain your in-house solution leave? When key staff members go, they take critical knowledge with them—especially if documentation isn’t consistently maintained. While this risk will depend on the size of the team, internally built systems often rely on a few key individuals, creating single points of failure should they leave. This means that updates, troubleshooting, and even day-to-day operations can become major challenges.
While building an in-house data solution is possible, this doesn’t necessarily mean it is easy or cost-effective. Of course, the specific costs and timelines in this report won’t apply to everyone. Some businesses may have existing infrastructure or specialized staff that make things easier.
Nonetheless, the core challenges are universal—maintenance, security, scalability, staff turnover. These impact every in-house project.
If you do decide to take the in-house route, we urge you not to underestimate these challenges, the unforeseen costs, and the long-term business risks they can lead to.
Trust us, we’ve been there!
While we stand by everything we’ve said above, naturally, this document is designed to convince you to take the buy route and, hopefully, buy Adverity.
And the truth is that by opting for an off-the-shelf solution like Adverity, you can eliminate many of the costs and risks outlined above.
Rather than managing the complexities of an in-house build, you can instead rely on a team whose sole business is in building and maintaining effective data pipelines alongside a whole host of additional features designed to simplify data management for you and your business.
While Adverity isn’t free, in most cases, we will end up considerably cheaper than an in-house build once all the costs and potential costs are considered.
API updates, system updates, security updates; with Adverity, you don’t need to worry about maintaining your data pipelines. It’s that simple.
Ensuring data privacy regulations are met without needing dedicated internal compliance resources.
Adverity is built to accommodate a vast range of different business sizes meaning that scaling your business needs is much simpler.
Alongside your data pipeline requirements, you also get access to a range of advanced features to maximize the value from your data.
Instead of spending months (or years) building a system, businesses can start leveraging their data far quicker.
If internal engineers leave, an in-house system can become unmanageable. With Adverity, continuity is guaranteed.
No need to worry about troubleshooting, API failures, or downtime. Expert support is always available.
Regular updates, new features, and optimizations are ongoing, ensuring the platform remains state-of-the-art.
Ultimately, choosing a solution like Adverity means you’re not just getting a product—you’re getting an entire company committed to making data work for you.
Instead of spending valuable time maintaining infrastructure, troubleshooting issues, and keeping up with the latest API changes, you can focus on what truly matters: growing your business, optimizing marketing campaigns, and making data-driven decisions.