Category Archives: beautifulsoup

mod_wsgi: Compiled version different than runtime when using BeautifulSoup

I'm developing a web framework and I'm running Python 3.4 on Windows 7 with Apache 2.4.16 (Apache Haus) with mod_wsgi-4.4.13+ap24vc10-cp34-none-win32.whl

My httpd.conf:

<VirtualHost *:80>

    ServerAdmin [email protected]

    DocumentRoot G:/Insanity/Web/Apache24/htdocs

    WSGIScriptAlias /example G:/Insanity/Web/Apache24/htdocs/example.py

</VirtualHost>

WSGIApplicationGroup %{GLOBAL}

When I run this example test without BeautifulSoup, it works just fine:

class App(object):

    def __init__(self, name):
        self.name = name
        self.html = \
        str.encode("""
            <html>
                <head>
                    <title>{name}</title>
                </head>
                <body>
                    <h1>Amazonia App is working!</h1>
                </body>
            </html>
        """.format(name = self.name))

    def __call__(self, environ, start_response):
        start_response("200 OK", [("Content-type", "text/html"),
                                        ('Content-Length', str(len(self.html)))])

        return [self.html]

application = App("Whatever!")

enter image description here

But when I run it with BeautifulSoup:

from bs4 import BeautifulSoup

class WebPage(object):

    def __init__(self, html=""):
        if html:
            self.html = html
        else:
            self.html = \
            """
                <!DOCTYPE html>
                <html>
                    <head>
                        <title></title>
                    </head>
                    <body></body>
                </html>
            """
            self.soup = BeautifulSoup(self.html, "html.parser")

    def add_css(self, *links):
        if links:
            for link in links:
                new_soup = BeautifulSoup('<link rel="stylesheet", \
                    href="{external_css}", type="text/css">'.format(external_css = link))
                self.soup.head.insert(0, new_soup.find("link"))
            self.update_document()

    def update_document(self):
        self.html = self.__str__()

    def __str__(self):
        return self.soup.prettify()

class WebApp(object):

    def __init__(self, webpage):
        self.webpage = webpage

    def __call__(self, environ, start_response):
        start_response("200 OK", [("Content-type", "text/html")])

        return [str.encode(self.webpage.html)]

The app does not load when I try to access it, and on logs/error.log I get:

[Tue Oct 06 14:44:43.125597 2015] [mpm_winnt:notice] [pid 7052:tid 332] AH00456: Server built: Jul 13 2015 10:37:15
[Tue Oct 06 14:44:43.125597 2015] [core:notice] [pid 7052:tid 332] AH00094: Command line: 'G:\\Insanity\\Web\\Apache24\\bin\\httpd.exe -d G:/Insanity/Web/Apache24'
[Tue Oct 06 14:44:43.126597 2015] [mpm_winnt:notice] [pid 7052:tid 332] AH00418: Parent: Created child process 5920
[Tue Oct 06 14:44:43.642398 2015] [ssl:warn] [pid 5920:tid 344] AH01909: localhost:443:0 server certificate does NOT include an ID which matches the server name
[Tue Oct 06 14:44:43.938799 2015] [ssl:warn] [pid 5920:tid 344] AH01909: localhost:443:0 server certificate does NOT include an ID which matches the server name
[Tue Oct 06 14:44:43.938799 2015] [wsgi:warn] [pid 5920:tid 344] mod_wsgi: Compiled for Python/3.4.3.
[Tue Oct 06 14:44:43.938799 2015] [wsgi:warn] [pid 5920:tid 344] mod_wsgi: Runtime using Python/3.4.2.
[Tue Oct 06 14:44:43.969999 2015] [mpm_winnt:notice] [pid 5920:tid 344] AH00354: Child: Starting 64 worker threads.

I get a different compiled version than the runtine one, just like in the case with mod_python (But I'm not using it). The issue is somehow related to BeautifulSoup. Anyone has any idea??

EDIT: By the way, when I don't use BeautifulSoup, I get no warnings at all (So the SSL ones are obscure as well).

Problems running beautifulsoup4 within Apache/mod_python/Django

I'm was trying to render an HTML-page on the fly using BeautifulSoup version 4 in Django (using Apache2 with mod_python). However, as soon as I pass any HTML-string to the BeautifulSoup constructor (see code below), the browser just hangs waiting for the webserver. I tried equivalent code in CLI and it works like a charm. So I'm guessing it's something related to BeautifulSoups environment, in this case Django + Apache + mod_python.

import bs4
import django.shortcuts as shortcuts

def test(request):
    s = bs4.BeautifulSoup('<b>asdf</b>')
    return shortcuts.render_to_response('test.html', {})

I have installed BeautifulSoup using pip, pip install beautifulsoup4. I tried to install BeautifulSoup3 using standard Debian packages, apt-get install python-beautifulsoup, and then the following equivalent code works fine (both from browser and CLI).

from BeautifulSoup import BeautifulSoup
import django.shortcuts as shortcuts

def test(request):
    s = BeautifulSoup('<b>asdf</b>')
    return shortcuts.render_to_response('test.html', {})

I have looked in Apaches access and error logs and they show no information what's happening to the request that gets stalled. I have also checked /var/log/syslog and /var/log/messages, but no further info.

Here's the Apache configuration I used:

<VirtualHost *:80>
    DocumentRoot /home/nandersson/src
    <Directory /home/nandersson/src>
        SetHandler python-program
        PythonHandler django.core.handlers.modpython
        SetEnv DJANGO_SETTINGS_MODULE app.settings
        PythonOption django.root /home/nandersson/src
        PythonDebug On
        PythonPath "['/home/nandersson/src'] + sys.path"
    </Directory>

    <Location "/media/">
        SetHandler None
    </Location>
    <Location "/app/poc/">
        SetHandler None
    </Location>
</VirtualHost>

I'm not sure how to debug this further, not sure if it's a bug or not. Anyone got ideas on how to get to the bottom of this or have run into similar problems?