ETL vs. ELT vs. QWELTY vs… – Pansynchro Technologies

A lot of words have been spilled various corners of the data engineering world regarding whether ETL or ELT is the better data sync model. “ELT is the way of the Modern Data Stack!” proclaim the partisans of one camp. “Copy your raw data over to a data lake first, and only then should you try to work with transforming it, because raw data is unreliable. It can change in unpredictable ways, and if anything changing while you’re running an import can break it, you lose the entire import job.” And they have a good point.

“That’s a feature, not a bug,” reply the old-school ETL die-hards. “Remember the Fail Fast principle: if something goes wrong, you want it to blow up very noticeably, as early as possible, to make it clear where the problem lies. If you’re several steps removed from the ingestion of bad data before you notice anything wrong, the problem is messier, harder to diagnose, and harder to fix.” And they have a good point too.

In an effort to distinguish themselves marketing-wise, some vendors have even started to invent new terms describing modified sync processes such as ELTP (Extract, Load, Transform, and Publish). (One might be forgiven for wondering what precisely the difference is between ELTP and traditional ETL, which it claims is obsolete…)

Does Pansynchro support my preferred sync model?

Yes.

But I didn’t say what it is yet…

That’s the beauty of it. You’re not tied down to one highly-opinionated model of data synchronization. A sync is a sync; you’re free to arrange it however you’d like. If you want to transform data in flight, go ahead and do it. (We’ve got first-class support for it in the PanSQL scripting system.) If you prefer something like DBT after the initial sync has run, we’re not going to stop you. In fact, we’re working on adding direct DBT support into the framework. If you want to run a sync and then take those results and copy them elsewhere, that’s fully allowed; there’s nothing magical about the sync command that requires scripts to end immediately after running one.

Our strong opinion that we won’t back down from is that your data sync should run quickly and efficiently. How you prefer to set it up is up to you.