Build vs Buy for Marketing Data Pipelines

If you’re reading this, you’re likely weighing whether to build your own data pipeline or buy a solution like Adverity. And if a sales rep sent this your way, they’re probably hoping to steer you toward the latter.

Naturally, we’d love for you to use our product, it’s what we do. But the reality is that building your own solution comes with serious challenges and costs, some less obvious than others.

We won’t pretend it’s impossible. Some companies successfully build in-house systems that meet their needs and even save money. But for many, it’s more complex and costly than expected.

With that in mind, this guide isn’t just here to sell you on our product. It’s a candid look at what building in-house solution really involves, especially the hidden challenges and expenses. If you do go that route, we want you to do so with open eyes.

After all, this is what we do every day. We understand the effort, the pitfalls, and what it really takes, and we’re proud of the work we put in.

And, if you do decide to build your own solution, we genuinely wish you the best. It’s a tough road, but it can be a rewarding one (we know, we’ve been there!)

What are you trying to build?

Before we start, it's crucial to understand the difference between two fundamental concepts: APIs and data pipelines.

What Is an API?

An Application Programming Interface (API) is a set of rules and protocols that allows one software application to interact with another. APIs define the methods and data formats that applications can use to communicate, enabling seamless data exchange and functionality sharing. For instance, a weather application on your smartphone uses APIs to retrieve data from remote servers, displaying up-to-date weather information.

However, while APIs facilitate access to data, they don't inherently manage the data's journey from source to destination. This is where data pipelines come into play.

What Is a Data Pipeline?

A data pipeline is a series of processes that systematically move, transform, and manage data from one system to another. It encompasses the entire data flow, from collection and ingestion through processing and storage, ensuring that data is accessible, reliable, and ready for analysis. Unlike a simple API connection, a data pipeline handles complex tasks such as authorizations, data validation, enrichment, and aggregation.

So, when we talk about a data pipeline, this is what we mean.

Key cost factors

Initial Setup

Setting up an API connection isn’t particularly difficult. There are plenty of open-source solutions available, and even with some customization, an experienced engineer can typically set one up in a few days.

However, that’s just the start. The real challenges (and costs) come once you have that connection in place and it’s time to build your data pipeline. Why? Because, after the initial API connection, you need to address a range of critical questions:

1. Authorization & Access

How will you authorize connections? Most, if not all, data sources require some sort of authorization in order to access the data in the first place. While there are many different systems, about 80% of data sources use token-based authentication.

If you don’t automate authorization, you’ll spend a huge amount of time manually requesting tokens and managing access. Depending on how much data you need to access and how often, having a system to automatically access your data is critical.

At Adverity, we have a whole bunch of features to make this simpler for our users, and you can read all about the benefits of centralized authorization here.

2. API Request Limits

Many APIs impose limits on the number of data requests you can make per day or even per minute. These can get quite complicated depending on what access level or account you hold, and generally, each platform measures limits in different ways. However, what this means is that there is a limit on how much data you can fetch from any given platform, and if you exceed these limits, your data pipeline could fail or require workarounds.

Importantly, these limits also can, and do, change, such as in 2023 when changes to GA4’s API limits caused a lot of problems for data engineers across the industry as they sought to find workarounds (you can read more about this here).

Again, at Adverity, we have a number of different techniques and strategies for solving this challenge, so that our customers never have to think, let alone worry, about rate limits. These include a lot of complicated-sounding terms such as intelligent throttling management and data chunking. If you are that way inclined, you can learn more about them here.

012_Marketing_Campaign_Build V Buy_BasicReport_Tables_001 Figure 1 Examples of rate limits and quotas for popular data sources

3. API Rules & Restrictions

Each API comes with its own set of rules and restrictions. Some may limit the type of data you can access or have strict compliance requirements. All these rules and restrictions add additional layers of complexity that need to be tackled to successfully and regularly extract your data.

4. Data Security

What security measures need to be in place? Who will be able to access the data? If you’re handling PII (Personally Identifiable Information), compliance with regulations like GDPR and CCPA is critical, and failure to comply could result in hefty fines. In most cases, simply extracting data to a spreadsheet will not be enough to ensure compliance, so you will have to consider how and where your data is stored and how it is accessed.

5. Data Storage

Where will the data be stored? There are a number of options for storing your data. You could choose data lake solutions such as Amazon S3 or Databricks or database solutions such as Snowflake or Google BigQuery. However, working with these destinations requires additional work on connecting your data source API to the destination API. Whatever solution you choose, you will need to determine how your data interacts with that chosen environment.

6. Data Mapping & Integration

If you’re collecting data from multiple data sources (and let’s face it, you probably are), then it needs to be correctly mapped and integrated before you can use it; otherwise, you’ll be attempting to compare apples to oranges. Now, this process can be done manually, but even with a small amount of data, the process is time-consuming and prone to human error. If dealing with any substantial amount of data, the task becomes almost impossible, making automated mapping and integration an essential component of any modern marketing data pipeline. It’s not just about simply fetching the data - it’s about making it usable.

This is just the initial setup. It doesn’t necessarily include all the things a data pipeline can do, but what, at its most basic, a data pipeline should do. And just for one data pipeline. This process needs to be repeated for every single data source you want to connect to.

"Adopting Adverity saved us countless engineering hours and operational headaches—what would’ve taken months to build internally was ready in days, giving us faster insights and a significantly lower total cost of ownership." Yassine Touimi Director, Performance Marketing, EverQuote

Ongoing Maintenance

Once your system is set up, you’ll need to maintain it. This requires ongoing resources and costs. In fact, this constitutes the biggest long-term financial commitment when building your own solution and the one that is often overlooked.

1. API Updates

APIs are frequently updated, which means that every API you work with will change over time. If you don’t update your data pipeline accordingly, it could, and most likely will, break, leaving you with inaccurate or missing data. Of course, not all APIs are created equal and are updated with varying degrees of regularity and scale. Some APIs are updated very frequently, while others are less than once a year. Some API updates are minor and innocuous, while others tend to be large-scale.

From a cost point of view, this comes back to one of those ‘it depends’ questions regarding which data sources you want to use. Either way, it’s essential to have a team in place monitoring API updates and updating your data pipelines accordingly.

In fact, as a business, API maintenance constitutes quite a large part of what we do, and we spare no resources to ensure that we know when every update will occur, what it consists of, and that all our connectors are updated. Unexciting? Possibly. Essential? Absolutely.

012_Marketing_Campaign_Build V Buy_BasicReport_Tables_002

Figure 2 Examples of API update schedules for popular data sources

2. Security & Compliance

Unsurprisingly, security and compliance are not issues that go away after the initial setup; they are factors that your team needs to be constantly vigilant about. Not least because APIs are a common target for cyberattacks. So, who is going to be responsible for securing your API connections, and how will your system handle DNS attacks, data breaches, or other security threats? Again, the time and resources spent on this will multiply with every data source and pipeline you need to maintain.

3. Ongoing data monitoring

Maintaining a data pipeline isn’t a "set it and forget it" job. You’ll need someone (or a team) to manage authorizations, integration issues, and troubleshoot errors if and when they arise. At the same time, data needs to be constantly monitored for accuracy, duplication, and general data quality to ensure your data is fit for purpose.

How much does it cost?

As already mentioned, with so many ‘it depends,’ the initial and ongoing costs of your in-house build are going to vary drastically from company to company.

Nonetheless, you’ll need some specific skills and experiences. You'll need someone who understands operating systems (like Linux), system security, and interfacing; someone who knows how APIs work and can maintain them; someone with web development skills to integrate the data; and someone with the data knowledge to transform raw files into meaningful, usable insights.

For simplicity’s sake, we’ve identified the following roles we think are essential to an average-sized build, along with what skills they need and their average compensation (base salary + benefits) in the USA.

So how much does it cost?

While bearing in mind all the ‘it depends’ questions, below we’ve provided rough estimates on the average time it takes to set up and maintain 10 data pipelines, based on our own experience. Why 10? Well, according to our research, 99% of marketing teams utilize at least 10 data sources.

Naturally, these estimates will vary from company to company, so we provide them only as a rough guide.

012_Marketing_Campaign_Build V Buy_BasicReport_Tables_003 Figure 3 Initial set-up estimated costs for 10 data sources

Figure 4 Estimated annual ongoing maintenance costs for 10 data sources

This brings us to a conservative estimate of roughly $26k for initial set-up and then an ongoing annual maintenance cost of around $90k.

In reality, this cost will be higher or lower depending on your company, your needs, your current setup, and how many and which data sources you want to work with.

To this end, we’ve developed a calculator to help you get a more accurate estimate of what an internal build would cost your business. You can also adjust the expected salaries for each role, as we are aware that in different regions, these costs will vary.

How long will it take?

Cost is one thing, but the time to completion is also crucial. Any downtime as you wait for a solution to become operational will impact your business and raise the question of what solution you will use in the interim.

To some extent, this can be mitigated by simply hiring or contracting more resources to expedite the process. However, of course, this needs to be weighed against the additional costs of sourcing and managing those resources.

That said, with a team consisting of a Data Engineer, a Software Developer, and a System Engineer, we estimate an average-sized solution comprised of 10 data pipelines would take a minimum of 8 weeks or about 2 months.

Importantly, if you want to scale by adding new data sources, then costs and resource allocation will need to grow accordingly.

Seriously, that much?

Yes. While we have a vested interest in showing a high cost for an internal build, this is roughly what it costs. And, a significant amount of this cost goes towards the ongoing maintenance of your solution.

This is why we have a whole army of engineers and developers whose sole job is to manage and maintain our APIs. And we need to because we have more than 600 API connectors!

Other key risks and factors to consider

The above estimates are based on what we consider the core costs involved in an in-house solution. However, there are a number of other factors and potential risks that, while less quantifiable in terms of actual cost, should nonetheless be considered by any business embarking on this type of project.

1. Administrative costs

Building an in-house solution requires an in-house team. Finding the right people takes time and resources, whether full-time, part-time, or contractors. Even if you already have the right people in-house, embarking on this project will take them away from other projects. Moreover, ongoing maintenance tasks will always require a dedicated team.

Once you’ve built an in-house team, someone also needs to manage them, track progress, coordinate workflows, ensure deliverables are met, and conduct performance reviews, and all these things take time away from other work.

Employees may require benefits like pensions, insurance, and training, plus there are also ongoing HR costs for payroll, compliance, and contract management.

These administrative overheads may seem insignificant; however they can add up quickly, making an in-house solution far more expensive than it might initially seem.

2. Staff access

How will users access and interact with the data? Data democratization - being able to share the right data, to the right people, at the right time - is a crucial part of any data set-up. One solution is to build your own platform to allow marketers and other teams to more easily retrieve and analyze data. However, without a well-designed UI, users may struggle to navigate the system, increasing reliance on engineers and limiting data accessibility.

Beyond development, someone needs to manage this platform handling updates, troubleshooting issues, and ensuring smooth operations. Then there’s training. Staff need to understand how to use the system effectively, and that means ongoing documentation, user guides, and support. Technical documentation isn’t a one-and-done task; it requires continuous updates and dedicated resources to maintain, adding yet another layer of cost and complexity.

3. Scalability

How will your in-house solution grow with your business? Without proper scalability, what works today may become slow, inefficient, or even obsolete as data volumes increase. More users, more data sources, and evolving business needs can strain your infrastructure, leading to performance issues and costly rework.

Managing scalability isn’t just about adding more storage, it requires planning for increased processing power, optimizing data pipelines, and ensuring systems can handle growing demand. Without a clear strategy, your in-house build could quickly become a bottleneck rather than a solution, requiring ongoing investment just to keep up.

4. Staff churn

What happens if the people who built and maintain your in-house solution leave? When key staff members go, they take critical knowledge with them, especially if documentation isn’t consistently maintained. While this risk will depend on the size of the team, internally built systems often rely on a few key individuals, creating single points of failure should they leave. This means that updates, troubleshooting, and even day-to-day operations can become major challenges.

Conclusion: Build or Buy?

While building an in-house data solution is possible, this doesn’t necessarily mean it is easy or cost-effective. Of course, the specific costs and timelines in this report won’t apply to everyone. Some businesses may have existing infrastructure or specialized staff that make things easier.

Nonetheless, the core challenges are universal; maintenance, security, scalability, staff turnover. These impact every in-house project.

If you do decide to take the in-house route, we urge you not to underestimate these challenges, the unforeseen costs, and the long-term business risks they can lead to.

Trust us, we’ve been there!

Now for the sales pitch…

While we stand by everything we’ve said above, naturally, this document is designed to convince you to take the buy route and, hopefully, buy Adverity.

And the truth is that by opting for an off-the-shelf solution like Adverity, you can eliminate many of the costs and risks outlined above.

Rather than managing the complexities of an in-house build, you can instead rely on a team whose sole business is in building and maintaining effective data pipelines alongside a whole host of additional features designed to simplify data management for you and your business.

Benefits of using Adverity

Lower Total Cost

While Adverity isn’t free, in most cases, we will end up considerably cheaper than an in-house build once all the costs and potential costs are considered.

No Worries About Maintenance

API updates, system updates, security updates; with Adverity, you don’t need to worry about maintaining your data pipelines. It’s that simple.

Built-in Compliance & Security

Ensuring data privacy regulations are met without needing dedicated internal compliance resources.

Simplified Scalability

Adverity is built to accommodate a vast range of different business sizes meaning that scaling your business needs is much simpler.

Advanced Features

Alongside your data pipeline requirements, you also get access to a range of advanced features to maximize the value from your data.

Faster Time to Value

Instead of spending months (or years) building a system, businesses can start leveraging their data far quicker.

No Risk of Losing Key Staff

If internal engineers leave, an in-house system can become unmanageable. With Adverity, continuity is guaranteed.

24/7 Technical Support

No need to worry about troubleshooting, API failures, or downtime. Expert support is always available.

Continuous Improvement

Regular updates, new features, and optimizations are ongoing, ensuring the platform remains state-of-the-art.

Ultimately, choosing a solution like Adverity means you’re not just getting a product, you’re getting an entire company committed to making data work for you.

Instead of spending valuable time maintaining infrastructure, troubleshooting issues, and keeping up with the latest API changes, you can focus on what truly matters: growing your business, optimizing marketing campaigns, and making data-driven decisions.

Build vs. Buy for Marketing Data Pipelines

A practical guide to the key considerations when developing an in-house data solution

What are you trying to build?

What Is an API?

What Is a Data Pipeline?

Key cost factors

Initial Setup

1. Authorization & Access

2. API Request Limits

3. API Rules & Restrictions

4. Data Security

5. Data Storage

6. Data Mapping & Integration

Ongoing Maintenance

1. API Updates

2. Security & Compliance

3. Ongoing data monitoring

How much does it cost?

Data Engineer

Software Developer

System Engineer

So how much does it cost?

How long will it take?

Seriously, that much?

Other key risks and factors to consider

1. Administrative costs

2. Staff access

3. Scalability

4. Staff churn

Conclusion: Build or Buy?

Now for the sales pitch…

Benefits of using Adverity

Find out more about how Adverity can help you today.