ropensci / rdflib

:package: High level wrapper around the redland package for common rdf applications

Home Page:https://docs.ropensci.org/rdflib

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

as_rdf needs to escape certain characters

josephguillaume opened this issue · comments

as_rdf.data.frame, write_nquads, normalize_table or poor_mans_nquads need to escape certain characters otherwise rdf_parse and therefore as_rdf either returns an error or no content.

The characters to be escaped include at least double quotes in string literals and spaces in predicates.
Examples follow below. There are obvious solutions to these particular cases, but it's not clear to me what would be needed for the solutions to be generally applicable and not cause any regressions.

A multiple word predicate silently fails to return any triples unless it is URLencoded

df <- data.frame(1,1)
names(df) <- c("id","multiple word predicate")
g <- as_rdf(df,prefix = "http://example.org#",key_column = "id")
g
# Total of 0 triples, stored in hashes

ntab<-rdflib:::normalize_table(df,key_column = "id")
ntab$predicate<-sapply(ntab$predicate,URLencode)
rdflib:::poor_mans_nquads(ntab,"temp.nquads",prefix="http://example.org#")
g<-rdf_parse("temp.nquads",format="nquads")
g
# Total of 1 triples, stored in hashes
#-------------------------------
#  <http://example.org#1> <http://example.org#multiple%20word%20predicate> "1"^^<http://www.w3.org/2001/XMLSchema#decimal> .

# As a workaround, a user can URLencode the data.frame names
df <- data.frame(1,1)
names(df) <- c("id","multiple word predicate")
names(df) <- sapply(names(df),URLencode)
g <- as_rdf(df,prefix = "http://example.org#",key_column = "id")
g

A string with double quotes silently fails to return any triples unless a backslash escape character is added (which itself needs to be escaped in R)

df <- data.frame(1,'string with "quotes"')
names(df) <- c("id","predicate")
g <- as_rdf(df,prefix = "http://example.org#",key_column = "id")
g
# Total of 0 triples, stored in hashes

ntab<-rdflib:::normalize_table(df,key_column = "id")
ntab$object<-gsub('"','\\"',ntab$object,fixed=T)
rdflib:::poor_mans_nquads(ntab,"temp.nquads",prefix="http://example.org#")
g<-rdf_parse("temp.nquads",format="nquads")
g
# Total of 1 triples, stored in hashes
# -------------------------------
#  <http://example.org#1> <http://example.org#predicate> "string with "quotes""^^<http://www.w3.org/2001/XMLSchema#string> .


# As a workaround, a user can replace quotes within the relevant columns of the data.frame
df <- data.frame(1,'string with "quotes"')
names(df) <- c("id","predicate")
df$predicate <- gsub('"','\\"',df$predicate,fixed=T)
g <- as_rdf(df,prefix = "http://example.org#",key_column = "id")
g