Skip to content Skip to sidebar Skip to footer

Remove Height And Width From Inline Styles

I'm using BeautifulSoup to remove inline heights and widths from my elements. Solving it for images was simple: def remove_dimension_tags(tag): for attribute in ['width', 'hei

Solution 1:

A full walk-through would be:

from bs4 import BeautifulSoup
import re

string = """
    <div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">
        <p>Some line here</p>
        <hr/>
        <p>Some other beautiful text over here</p>
    </div>
    """# look for width or height, followed by not a ;
rx = re.compile(r'(?:width|height):[^;]+;?')

soup = BeautifulSoup(string, "html5lib")

for div in soup.findAll('div'):
    div['style'] = rx.sub("", string)

As stated by others, using regular expressions on the actual value is not a problem.

Solution 2:

You could use regex if you want, but there is a simpler way.

Use cssutils for a simpler css parsing

A simple example:

from bs4 import BeautifulSoup
import cssutils

s = '<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'

soup = BeautifulSoup(s, "html.parser")
div = soup.find("div")
div_style = cssutils.parseStyle(div["style"])
del div_style["width"]
div["style"] = div_style.cssText
print (div)

Outputs:

>>><div class="wp-caption aligncenter"id="attachment_9565" style="background-color: red"></div>

Solution 3:

import bs4

html = '''<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red">'''

soup = bs4.BeautifulSoup(html, 'lxml')

Tag's attribute is a dict object, you can modify it like a dict:

get item:

soup.div.attrs

{'class': ['wp-caption', 'aligncenter'],
 'id': 'attachment_9565',
 'style': 'width: 2010px;background-color:red'}

set item:

soup.div.attrs['style'] = soup.div.attrs['style'].split(';')[-1]

{'class': ['wp-caption', 'aligncenter'],
 'id': 'attachment_9565',
 'style': 'background-color:red'}

Use Regex:

soup.div.attrs['style'] = re.search(r'background-color:\w+', soup.div.attrs['style']).group()

Post a Comment for "Remove Height And Width From Inline Styles"