Introducing Space and Time Python Data Jobs
What it is and why we built it
If you’ve been keeping up with Space and Time, you’ve probably heard that we built the first and only ZK proof for SQL. Proof of SQL is an incredibly powerful tool that allows a smart contract to retrieve and process data with SQL in a cryptographically proven way, which opens up a wealth of new use cases for blockchain technology. But while SQL is a powerful and near-Turing-complete language, it doesn’t cover 100% of business use cases. In order to create custom business logic, for example, you’ll eventually run into the need to deploy arbitrary code. Chainlink built an impressive solution to this: Chainlink Functions. Functions allows Javascript to run redundantly on Chainlink nodes, which come to consensus on the output. Now, smart contracts can access ZK-proven analytics and data processing with Proof of SQL, as well as fast-running scripts with Functions.
But there’s still another class of use cases that hasn’t been solved in Web3: long-running Python jobs. The businesses and developers working with Space and Time are working with data, and data engineers use Python, so we knew we needed to solve two things. First: enable users to leverage Python to extract data from their existing database, transform it, and load it into Space and Time in the easiest and fastest way possible, without actually writing code. Second: connect Python jobs to smart contracts in a cryptographically guaranteed way. Introducing Space and Time Python Data Jobs, now available in beta on the Space and Time Studio.
How it works
Getting data into Space and Time
Python Data Jobs accelerates the process of getting data into Space and Time from any offchain source without ever writing code. Earlier this year, Space and Time released AI SQL, an OpenAI-powered service that allows a user to write a natural language prompt like “show me the top 5 wallets on Sui with the most transactions ordered by balance,” converts it into a SQL query, and returns the result. We’re excited to share that Houston, the AI chatbot in the Space and Time Studio, can now be used to generate simple ETL (extract, transform, load) scripts to grab data from source from Web2 databases or Web3 decentralized storage platforms, prep it, and load it into Space and Time. Houston creates a script that connects to PostgreSQL (or Snowflake or IPFS, as examples), understands what's in the database, transforms it, creates tables in SxT, and loads one row at a time out of PostgreSQL and into SxT. Normally, database migration is a long, expensive, and tedious job that requires Python. Now, you can do it with natural language in a single pass.
Getting data out of Space and Time
Python Data Jobs can also be used to get data out of Space and Time, process it, and send it to a smart contract. The reason this has yet to be solved for Web3 lies in the fact that Python jobs often run for a long time. For example, if you have a script to calculate the probability that BTC remains above $40k for the rest of the year, that script has to capture data from the markets, process it, and run a Monte Carlo simulation against it in Python, which altogether might take around 20 seconds. And if you’re connecting the result to a smart contract, you need to ensure that it’s tamperproof. Consensus-based proving is perfect for fast-running scripts, but it doesn’t work well for a script running that long. If you’re running the computation redundantly across, say, 30 nodes, node 1 might finish the job in 18 seconds, while node 5 finishes in 25, and node 15 finishes in 21. A new architecture is required.
Throughout the Python Data Jobs beta, Space and Time is working on a solution to do this with ZK. Today, it relies on optimistic security (sort of like an optimistic rollup). When you run a Python Data Job in SxT, the inputs, outputs, and code itself are all hashed to a major chain. The script is only run once, and if the outcome isn’t as expected, the user can request a proof and SxT cryptographically proves what was run. Instead of proving it in real time with redundant computation and consensus, we just run it once and hash all the metadata to create a tamperproof audit trail to incentivize node operators not to tamper with the execution. We’re excited to share more about the ZK solution that will enhance the real-time security of Python Data Jobs.
What it enables
Seamless database migrations
Simply tell Houston what migration you want to do, give it access to the source database, and Houston uses the already-built prompt-to-SQL framework to retrieve information about the database. If you say, “write a Python script to load my Snowflake data into SxT,” Houston will prompt you for access and generate a Python script that queries Snowflake, grabs the data, figures out the schema, and replicates it to SxT in one single LLM inference.
Example use case: Truflation ingests massive volumes of real-time inflation data across dozens of different data feeds (commodities, bond rates, housing, etc.) into storage, then builds aggregations (inflation indexes) to be exposed onchain via oracles. With Python Data Jobs, these large volumes of data can be efficiently processed and prepped for the aggregations.
Example use case: dClimate periodically ETLs weather data from multiple weather feeds and stores it. Python Data Jobs can streamline this process by automating the extraction and transformation of the weather data.
Complex calculations for DeFi
Imagine if your smart contract could run complex offchain computations, like forecasting the future performance of a coin based on varying market conditions, in a tamperproof way. Python Data Jobs lets you integrate sophisticated financial models, like those used for predicting price movements or assessing risk factors, in your smart contract with optimistic security. This allows DeFi protocols to leverage more sophisticated business logic beyond what Proof of SQL enables.
Example use case: dYdX executes calculations for perpetual options/futures pricing offchain, because they require historical pricing input data and complex compute that cannot be executed by smart contracts onchain. Python Data Jobs allows these calculations to be done in a tamperproof way.
Example use case: 3Commas executes offchain machine learning models for DeFi/CeFi decisioning (swaps, futures, bot trades, etc.) in a centralized compute container environment. Python Data Jobs provides a Web3-native alternative.
Get started
You can get started building Python Data Jobs with Houston on the Space and Time Studio. To celebrate the beta release of our new product, we’re providing Python Data Jobs for free to all users for one month.