Remove Height And Width From Inline Styles
I'm using BeautifulSoup to remove inline heights and widths from my elements. Solving it for images was simple: def remove_dimension_tags(tag): for attribute in ['width', 'hei
Solution 1:
A full walk-through would be:
from bs4 import BeautifulSoup
import re
string = """
<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">
<p>Some line here</p>
<hr/>
<p>Some other beautiful text over here</p>
</div>
"""# look for width or height, followed by not a ;
rx = re.compile(r'(?:width|height):[^;]+;?')
soup = BeautifulSoup(string, "html5lib")
for div in soup.findAll('div'):
div['style'] = rx.sub("", string)
As stated by others, using regular expressions on the actual value is not a problem.
Solution 2:
You could use regex if you want, but there is a simpler way.
Use cssutils
for a simpler css parsing
A simple example:
from bs4 import BeautifulSoup
import cssutils
s = '<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'
soup = BeautifulSoup(s, "html.parser")
div = soup.find("div")
div_style = cssutils.parseStyle(div["style"])
del div_style["width"]
div["style"] = div_style.cssText
print (div)
Outputs:
>>><div class="wp-caption aligncenter"id="attachment_9565" style="background-color: red"></div>
Solution 3:
import bs4
html = '''<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'''
soup = bs4.BeautifulSoup(html, 'lxml')
Tag's attribute is a dict object, you can modify it like a dict:
get item:
soup.div.attrs
{'class': ['wp-caption', 'aligncenter'],
'id': 'attachment_9565',
'style': 'width: 2010px;background-color:red'}
set item:
soup.div.attrs['style'] = soup.div.attrs['style'].split(';')[-1]
{'class': ['wp-caption', 'aligncenter'],
'id': 'attachment_9565',
'style': 'background-color:red'}
Use Regex:
soup.div.attrs['style'] = re.search(r'background-color:\w+', soup.div.attrs['style']).group()
Post a Comment for "Remove Height And Width From Inline Styles"