Productsup
Fundamentals

Data processing performance

What makes a feed fast or slow to process, and the things that actually move the needle: catalog size, file format, and how much work you ask the platform to do.

13 min read

A large volume of product data feeding a processing engine, with a speed gauge and fast throughput.

Two sites can run the exact same products, and one finishes in seconds while the other grinds for a long time. The difference comes down to how much work the platform has to do and what shape your data is in. This page covers what actually affects processing speed, so you know which knobs matter when things feel slow.

What "processing" means here

Every time your site runs, the platform does three things: it imports your data, optimizes it, and exports it to your channels. Each of those stages takes time, and that time scales with two things: how much data you've got, and how much work you're asking the platform to do to it. Keep those two ideas in mind and most performance questions answer themselves.

Catalog size is the biggest factor

The single largest driver is how much data you're moving. A useful way to think about it is total cells: your number of products multiplied by your number of attributes.

CatalogRough scale (cells)
SmallUnder 1 million cells
Medium1 to 10 million cells
Large10 to 50 million cells
After Review50 million+ cells

A site with 400,000 products and 80 attributes each is processing about 32 million values every run. More cells means more to read, transform, and write, so the bigger your catalog, the more every other choice on this page matters.

What's inside each cell counts too

Cell count isn't the whole story. The size of what's inside each cell matters just as much. A cell holding a short value like in stock is nothing to move around. A cell holding a long HTML product description, tags and all, is a different beast.

Big cell contents cost you twice:

  • Reading and writing. The more text a cell holds, the longer it takes to read in and write back out. A catalog full of long descriptions moves slower than the same number of products with short ones.
  • Transforming. Running a rule over a huge chunk of content, like a search-and-replace across a large block of HTML, is far slower than the same rule on a short string. Do that across millions of rows and it really stacks up.

So if your feed leans on large text fields like rich HTML descriptions, expect it to run slower than its product count alone would suggest. Trimming markup you don't need, early on, pays off across every run.

File format makes a real difference

The format your data arrives and leaves in changes how much work the platform spends reading it on import and writing it on export, before any optimization even happens. That cost lands twice: once when the platform reads the file coming in, and again when it writes the file going out.

FormatProcessing costReading and writing
CSVLowestQuick to read in and quick to write out, since flat rows have almost nothing to parse
ExcelLow to mediumReads and writes much like CSV, but the file itself is heavier to open
JSONMediumNesting has to be walked when reading and rebuilt when writing
XMLHighestVerbose tags are slow to parse on the way in and bulky to generate on the way out

CSV is the most performant choice for both import and export. It carries the least overhead and the simplest structure, so there's less for the platform to chew through on the way in and less to build on the way out. If you've got the choice and your data is flat, CSV is the fast lane.

How much you ask the platform to do

Importing data is one thing. Reshaping it is where time really adds up. Every rule box and optimization you apply runs once per product, so the cost multiplies with your catalog size.

A few things to watch:

  • Heavy transformations. Complex rules, big lookups, and chained logic all take longer than a simple copy.
  • Data services. Calls that enrich or check your data add real time, especially the ones that reach outside the platform.
  • Images and media. Processing or validating image links is slower than handling plain text.

None of this means you should avoid optimizing your data. It just means that if a site feels slow, the number and complexity of your rules is one of the first places to look.

How often, and how much, you run

Running your whole catalog from scratch every time is the slowest way to work. If only a handful of products changed, reprocessing everything is wasted effort.

This is where running only the changes helps. When the platform can tell what's new or different since the last run, it processes just that slice instead of the entire catalog, which is far faster. Pair that with a sensible schedule, and you're spending time only where it's actually needed.

API vs. flat files, from a speed angle

The way data moves also shapes performance. Flat files are great for big batch updates: one file, processed in bulk. APIs can push changes through closer to real time, which is ideal for fast-moving data like stock and pricing, though they often come with rate limits that cap how much you can send at once. Neither is faster in every case. It depends on whether you're moving a lot at once or a little, often.

How to improve performance

The three stages of a site run each give you ways to trim the work. Here's where to look in each one.

Import

The goal here is to bring in less, and bring it in cleanly.

  1. Import only what you need. Drop attributes you'll never use and filter out products you don't sell. Every column and row you skip is work the rest of the pipeline never has to do.
  2. Prefer CSV for the source. If you control the feed, a flat CSV reads in faster than nested JSON or XML.
  3. Pull only the changes when you can. If your source can hand over just what's new or updated, take it instead of re-importing the whole catalog each time.
  4. Match the schedule to reality. Don't import every hour if your source only updates once a day.

Data transformation

This is where work multiplies across every product, so small savings add up fast.

  1. Optimize once, reuse everywhere. Apply shared rules at the intermediate level so every export inherits them, instead of repeating the same logic per channel.
  2. Keep rule boxes lean. Remove ones you no longer use, and reach for a simple rule before a complex one. Heavy regex and big lookups cost the most.
  3. Go easy on external data services. Calls that reach outside the platform are some of the slowest steps, so use them where they earn their keep and skip them where they don't.
  4. Drop products early. Exclude what you won't export near the start, so you're not transforming rows that never leave.

Export

The aim is to send each channel only what it asks for, only when it needs it.

  1. Export only the channel's attributes. Channels ignore fields they don't recognize, so there's no point building them.
  2. Send only the changes. When a channel supports delta updates, push what changed instead of the full feed every run.
  3. Pick the lightest format the channel allows. CSV writes out faster than XML.
  4. Right-size the frequency. Use API exports for frequent small updates and flat files for big batch runs, and schedule each export to match how often the channel actually needs fresh data.

In short

Processing speed comes down to volume and effort. Big catalogs, heavy formats like XML, and lots of complex rules all slow things down, while CSV, lean optimizations, and running only what changed all speed things up. When a site feels slow, start with the size of the data and the amount of work you're asking for.

On this page

Still stuck?

Reach out to our support team and we’ll help you get unstuck.

Contact support