Query the pipeline
This document assumes that you've already followed our installation guide.
Streaming systems are invaluable in modern data architectures due to their ability to process and analyze data in real-time as it flows through a system. One of the primary use cases that makes streaming systems so powerful is aggregation. By continuously processing incoming events, streaming systems allow for the dynamic creation and updating of table views, providing up-to-the-moment insights without the need for batch processing.
This real-time data transformation capability enables applications to maintain live dashboards, detect anomalies instantly, and make data-driven decisions with minimal latency. Ceramic offers a powerful data pipeline suitable for a variety of workloads.
Flight SQL library
Flight SQL is a protocol designed efficiently handling and querying large-scale analytical data by extending Apache Arrow Flight, an RPC protocol built for high-performance data transfer in big data application settings.
Ceramic nodes expose a Flight SQL endpoint which allows developers to query the various parquet tables across the Ceramic data pipeline.
Why Flight SQL?
Flight SQL organizes data into columnar format for fast, memory-efficient data handling, thus reducing the amount of time needed to serialize, deserialize, and transfer data between clients and servers. At the same time, Flight SQL supports SQL queries, making it highly useful for applications requiring SQL analytics.
Since Flight SQL's protocol is tailored for high-throughput data transport across networks, this makes it ideal for moving large, columnar data (like Parquet) over the network between distributed systems. Furthermore, Parquet data can be loaded into an Arrow-based in-memory format and then queried and transmitted efficiently, ideal for intense data workloads.
Set up your client
The @ceramic-sdk/flight-sql-client
is a package designed to run server-side only. Using this
client needs a ceramic-one daemon running with
experimental flags which point to your S3 bucket:
ceramic-one -- daemon --experimental-features --flight-sql-bind-address 0.0.0.0:5102 --object-store-url s3://your-bucket-name
Install the flight-sql package
npm install --save @ceramic-sdk/flight-sql-client apache-arrow
Create a client instance
import { ClientOptions, createFlightSqlClient } from "@ceramic-sdk/flight-sql-client";
import { tableFromIPC } from "apache-arrow";
const OPTIONS: ClientOptions = {
headers: new Array(),
username: undefined,
password: undefined,
token: undefined,
tls: false,
host: "127.0.0.1",
port: 5102,
};
const client = await createFlightSqlClient(OPTIONS);
Execute a query
const buffer = await client.preparedQuery(
"SELECT * from conclusion_events where stream_type = $1",
new Array(["$1", "3"])
);
// deserialize the IPC format into a table
const data = tableFromIPC(buffer);
const row = data.get(0);
// data will be stored as bytes - transform to readable
const content = JSON.parse(Buffer.from(row?.data).toString());
console.log(content);