Serialize section title from editor
cmc333333 opened this issue · comments
The ProseMirror editor isn't adding a title field to sections when it serializes them, meaning we lose that from the database.
This may be a good time to encode the associated schema as
sect: heading block+
to ensure we'll always have a heading to grab a title from. If we do that, though, we'll need to verify the pdf parser is emitting that consistently.
As far as I can tell, it seems like the PDF parser is always emitting a sec
before it emits a heading
:
def begin_heading(self, heading):
while heading.level <= self.sec_level:
self.cursor_stack.pop()
while heading.level > self.sec_level:
self.cursor_stack.append(self.cursor.add_child('sec'))
self.cursor_stack.append(self.cursor.add_child('heading'))
self.cursor.pdf_node = heading
The first while loop ensures that heading.level > self.sec_level
, so the second while loop is guaranteed to execute at least once, meaning that at least one sec
is going to be added immediately before a heading
.
That second while loop is also the only place in semdb.py
where a sec
is created, so it's also true that there aren't any sec
s that don't have heading
s immediately after them. So I think that part is cool, though perhaps we could add tests that ensure this is the case--or perhaps we could add some sort of schema validator (see #907) that ensures that once we've imported all the PDFs we can, the sec: heading block+
rule is valid for all our imported documents.
Excellent, thanks for verifying, @toolness !