Changing Text without editing the formatting
powahftw opened this issue · comments
To edit some text i currently use pharagraphs and runs and apply all the styles i'm interested back. It would be interesting to have a way to just change the text of an existing text, leaving the formatting untouched.
@powahftw Character formatting (font characteristics) are specified at the Run
level. A Paragraph
object contains one or more (usually more) runs. When assigning to Paragraph.text
, all the runs in the paragraph are replaced with a single new run. This is why the text formatting disappears; because the runs that contained that formatting disappear.
Although it would not work for all cases one might want, a useful behavior would be to replace the text in a paragraph, retaining the formatting present in the first run. This could be accomplished like this:
def replace_paragraph_text_retaining_initial_formatting(paragraph, new_text):
p = paragraph._p # the lxml element containing the `<a:p>` paragraph element
# remove all but the first run
for idx, run in enumerate(paragraph.runs):
if idx == 0:
continue
p.remove(run._r)
paragraph.runs[0].text = new_text
paragraph = textframe.paragraph[0] # or wherever you get the paragraph from
new_text = 'foobar'
replace_paragraph_text_retaining_initial_formatting(paragraph, new_text)
I haven't tested this, maybe you can report back any mistakes if you try it out, but I think it gives the gist.
This would be roughly how such a feature would be implemented.
@scanny Just tried your suggested function and it works perfectly - thanks!
wow... this is perfect. I had written a loop of try/except that stored the attributes of the first run and then re-applied them after changing the text. Feel like a barbarian.
Hi guys,
Thanks @scanny for the snippet! It was really helpful.
I did encounter something strange though. For some unknown reason(s), some cells in my file despite looking like a single line of text, it was split into a number of runs
.
So, I had to modify your snippet into this
...
whole_text = " ".join([r.text for r in paragraph.runs])
whole_text = re.sub(replacement_string, new_text, whole_text)
for idx, run in enumerate(paragraph.runs):
if idx == 0:
continue
p = paragraph._p
p.remove(run._r)
paragraph.runs[0].text = whole_text
...
Hope this helps... or well, I just wanted to share the solution to the whole morning of frustration...
This really helps us.... thank you very much @scanny
Hi @scanny ,
Thanks for the snippet! Question though - is changing the text on the paragraph really supposed to clear the formatting or is that a bug?
Thanks
@franz-see Paragraph.text
is a convenience property. There is no general-case way to change the text while preserving the formatting. For example, if you wanted to replace:
The quick, brown fox.
with:
The lazy yellow dog.
How would you do that with something like Paragraph.text
? So assigning text to Paragraph.text
replaces all the runs in the paragraph with a single run containing the assigned text with no special formatting.
Character formatting provided by the paragraph-style is preserved, and generally this produces the best possible result. If you need to apply "inline" character formatting, then you need to do it yourself, run-by-run.
One way to do this is to assign ""
to paragraph.text
to "clear" the existing text and then add runs to the paragraph to suit.
That's really useful, thanks @scanny.
Is there a way to keep the formatting when replacing text in tables (cells), too?
To be honest, I don't fully understand what your above code does, so I'd appreciate any help.
def replace_text(self, replacements: dict, shapes: list):
for shape in shapes:
for match, replacement in replacements.items():
if shape.has_table:
for row in shape.table.rows:
for cell in row.cells:
if match in cell.text:
new_text = cell.text.replace(str(match), str(replacement))
cell.text = new_text
from: https://stackoverflow.com/questions/37924808/python-pptx-power-point-find-and-replace-text-ctrl-h
I think, I solved it. It was just a matter of finding how to access the run
level for cells. I'll leave the solution here in case someone encounters a similar problem.
Thank you for this great module!
for shape in shapes:
for match, replacement in replacements.items():
if shape.has_table:
for row in shape.table.rows:
for cell in row.cells:
if match in cell.text:
for paragraph in cell.text_frame.paragraphs:
for run in paragraph.runs:
p = paragraph._p # the lxml element containing the `<a:p>` paragraph element
# remove all but the first run
for idx, run in enumerate(paragraph.runs):
if idx == 0:
continue
p.remove(run._r)
cur_text = run.text
new_text = cur_text.replace(str(match), str(replacement))
run.text = new_text
Same code, slightly refactored for length and indent-level. I think there was a bug in there too, you deleted runs before capturing the text they contain.
def iter_table_cells(shapes):
for shape in shapes:
if not shape.has_table:
continue
for row in shape.table.rows:
for cell in row.cells:
yield cell
for cell in iter_table_cells(shapes):
for match, replacement in replacements.items():
for paragraph in cell.text_frame.paragraphs:
if match not in paragraph.text:
continue
orig_text = paragraph.text
# --- the lxml element containing the `<a:p>` paragraph element ---
p = paragraph._p
# --- remove all but the first run ---
for run in paragraph.runs[1:]:
p.remove(run._r)
run = paragraph.runs[0]
run.text = orig_text.replace(str(match), str(replacement))
You are very kind. Thanks again.
I am trying to highlight a specific word in red color in a pptx file using the below function.
def highlight_word_in_text(paragraph,highlight_word):
p = paragraph._p
paratext = p.text
if highlight_word in paratext:
for idx, run in enumerate(paragraph.runs):
if idx == 0:
continue
p.remove(run._r)
paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
run = paragraph.add_run()
run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
run.font.color.rgb = RGBColor(255, 0, 0)
run = paragraph.add_run()
run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]
While it works, it loses the formatting and also adds some unusual characters like '_x000B' to some words where it finds the highlighted word. Could you please tell me what I am missing?
Full code below:
def highlight_word_in_text(paragraph,highlight_word):
p = paragraph._p
paratext = p.text
if highlight_word in paratext:
for idx, run in enumerate(paragraph.runs):
if idx == 0:
continue
p.remove(run._r)
paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
run = paragraph.add_run()
run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
run.font.color.rgb = RGBColor(255, 0, 0)
run = paragraph.add_run()
run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]
prs2 = Presentation('test2.pptx')
for slide in prs2.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
highlight_word_in_text(paragraph,'business')
prs2.save('test3.pptx')