apache / pulsar

Apache Pulsar - distributed pub-sub messaging system

Home Page:https://pulsar.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PostgreSQL sink connector doesn't persist message to table

alexandrebrilhante opened this issue · comments

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS: macOS Sonoma 14.4.1
Java: OpenJDK 17.0.11
Pulsar: 3.2.1

Minimal reproduce step

Example detailed here seems outdated. I've followed every step but still can't see new records in PostgreSQL. For comparison, there's seems to be no issue when switching to Cassandra with the same schema and producer setup. I've tried with both local and dockerized Postgres databases.

pulsar-postgres-jdbc-sink.yaml

configs:
  userName: "postgres"
  password: "postgres"
  jdbcUrl: "jdbc:postgresql://localhost:5432/postgres"
  tableName: "pulsar_postgres_jdbc_sink"

schema

{
  "type": "AVRO",
  "schema": "{\"type\":\"record\",\"name\":\"test\",\"fields\":[{\"name\":\"id\",\"type\":[\"null\",\"int\"]},{\"name\":\"name\",\"type\":[\"null\",\"string\"]}]}",
  "properties": {}
}

main.rs - 127.0.0.1:9999 sends dummy data e.g. {"id" 1, "name" "abcdefg"} which main then sends to Pulsar.

use pulsar::{producer::ProducerOptions, Pulsar, TokioExecutor};
use tokio::{io::AsyncReadExt, net::TcpListener, sync::mpsc};

#[tokio::main]
async fn main() {
    let addr: &str = "pulsar://localhost:6650";

    let pulsar: Pulsar<_> = Pulsar::builder(addr, TokioExecutor)
        .build()
        .await
        .expect("Failed to connect to Pulsar...");

    let topic_name: &str = "persistent://public/default/pulsar-postgres-jdbc-sink-topic";

    let mut producer: pulsar::Producer<TokioExecutor> = pulsar
        .producer()
        .with_topic(topic_name)
        .with_name("producer")
        .with_options(ProducerOptions {
            batch_size: Some(4),
            ..Default::default()
        })
        .build()
        .await
        .expect("Failed to create producer...");

    let (tx, mut rx) = mpsc::channel(100);

    let _producer_task: tokio::task::JoinHandle<()> = tokio::spawn(async move {
        while let Some(message) = rx.recv().await {
            match producer.send(message).await {
                Ok(_) => println!("Message sent to Pulsar..."),
                Err(e) => eprintln!("Failed to send message to Pulsar; err = {:?}...", e),
            }
        }
    });

    let listener: TcpListener = TcpListener::bind("127.0.0.1:9999")
        .await
        .expect("Failed to bind to address...");

    loop {
        let (mut socket, _addr) = listener
            .accept()
            .await
            .expect("Failed to accept connection...");

        let tx: mpsc::Sender<String> = tx.clone();

        tokio::spawn(async move {
            let mut buf: [u8; 1024] = [0; 1024];

            loop {
                let n: usize = match socket.read(&mut buf).await {
                    Ok(n) if n == 0 => return,
                    Ok(n) => n,
                    Err(e) => {
                        eprintln!("Failed to read from socket; err = {:?}...", e);
                        return;
                    }
                };

                let message = String::from_utf8_lossy(&buf[0..n]).to_string();

                if tx.send(message).await.is_err() {
                    eprintln!("Failed to send message to channel...");
                    return;
                }
            }
        });
    }
}

Complete setup:

bin/pulsar standalone

bin/pulsar-admin schemas upload pulsar-postgres-jdbc-sink-topic -f $PWD/pulsar/connectors/schema

bin/pulsar-admin sinks create \
    --archive $PWD/pulsar/connectors/pulsar-io-jdbc-postgres-3.2.2.nar \
    --inputs pulsar-postgres-jdbc-sink-topic \
    --name pulsar-postgres-jdbc-sink \
    --sink-config-file $PWD/pulsar/connectors/pulsar-postgres-jdbc-sink.yaml \
    --parallelism 1

cargo build --release && cargo run --release

What did you expect to see?

PostgreSQL table pulsar_postgres_jdbc_sink being populated in real-time.

What did you see instead?

PostgreSQL table pulsar_postgres_jdbc_sink is empty although Pulsar is the producing the message properly.

Anything else?

No issues when inspecting the sink or the topic. Pulsar is able to produce the messages.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Seems to be related to the schema Postgres is using which is weird I'm using the same schema as the getting started example for Pulsar IO.

org.postgresql.util.PSQLException: ERROR: null value in column "id" of relation "pulsar_postgres_jdbc_sink" violates not-null constraint
  Detail: Failing row contains (null, null).
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190) ~[postgresql-42.5.1.jar:42.5.1]
	at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:177) ~[postgresql-42.5.1.jar:42.5.1]
	at org.apache.pulsar.io.jdbc.JdbcAbstractSink.flush(JdbcAbstractSink.java:289) ~[pulsar-io-jdbc-core-3.2.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]

Looks I was trying to persist {"id" 1, "name" "abcdefg"} as opposed to just {"name" "abcdefg"} as expected by the sink connector.