Setting Up web2py2 on DreamHost

Reference: http://www.bikmort.com/dokuwiki/web2py_on_dreamhost by Tim Korb. Accessed 2017-10-14.

Tim's directions worked very well for me with minor tweaks.

  • Create a fully-hosted domain at http://panel.dreamhost.com. I used web2py.guaminsects.net.

  • Site configuration options include:

    • Remove WWW from beginning of name
    • use PHP 7.0 FastCGI (default)
    • use HTTPS
    • use Passenger (Click OK on dialog about /public)
  • Use a shell account to open a terminal window on your DreamHost server.

    ssh <user>@web2py.guaminsects.net
    
  • Create a virtual python environment in ~/python (could go elsewhere, but remember the path):

    cd ~
    virtualenv python
    
  • Clone sources from the Web2Py github site into a top level directory called web2py:

    git clone --recursive https://github.com/web2py/web2py.git
    cp -a web2py/. web2py.guaminsects.net/
    
  • Copy handlers/wsgihandler.py to passenger_wsgi.py:

    cd web2py.guaminsects.net
    cp handlers/wsgihandler.py passenger_wsgi.py
    
  • Edit passenger_wsgi.py. Insert these two lines at immediately before the line "if not os.path.isdir('applications'):". Note that the INTERP path points to the virtualenv created earlier:

    INTERP = os.path.join(os.environ['HOME'], 'python', 'bin', 'python')
    if sys.executable != INTERP: os.execl(INTERP, INTERP, *sys.argv)
    
  • Create tmp/restart.txt. Touch this file to force relaunch of Web2Py via Passenger:

    mkdir tmp
    touch tmp/restart.txt
    
  • Set an admin password. This command will give errors (for among other reasons, a user process cannot listen on port 443), but it will have generated the necessary parameters_443.py file.

    python web2py.py -p 443 -a "secret-secure-password"
    
  • The Web2Py environment should now be operational. Visit https://web2py.guaminsects.net to see it.

  • Clean up. Delete the top level directory where the web2py git repository was downloaded:

    cd ~
    rm -rf web2py
    

Migrate a MySQL database to postgreql

Ran into some problems using MySQL on Dreamhost as the online database manager for my web2py PestList app. Decided to upgrade the database to postgresql. DreamHost doesn't support pg, so I had to revert to running pg on localhost. After several attempts to convert the MySQL tables to pg, I came across pgloader, which did the trick. I simply created an empty pg database and ran the following command.

$ pgloader mysql://user:pass@mysql.guaminsects.net/pestlist postgresql://user:pass/localhost/import_test

Converting a web2py web site to a static web site

I am building a web site using the web2py framework. The site uses data stored in MySQL database to build an index page of crops. For each crop, there is a link to a page which is built to display all pests attacking the crop.

I would like to host a copy of this as a static site in GitHub pages. Creating the static site was surprisingly easy:

  1. Run the web2py site in a local test browser.
  2. In an empty folder, use wget to save local a local copy of every page generated by the site:
wget --recursive --no-clobber --page-requisites \
 --adjust-extension --convert-links \
 --restrict-file-names=windows \
 --no-parent \
  http://127.0.0.1:8000/pestlist/default/crop_index
  1. On the crop_index.html page, change links to the pest_index.html pages. Easily done using relace all in atom or some other editor.
  2. Upload all files to a GitHub repo and enable GitHub pages for the repo.
  3. Create an index.html page in the top level folder of the repo which links to crop_index.html.

Calculate geographical coordinates for equidistant points at vertices of a triangular grid

I needed to set up a grid of equally spaced insect pheromone traps. While it is convenient to put traps at the vertices of a rectangular grid, this is not optimal because nearest neighbors on the diagonals are further away than those within rows or columns (If the spacing within rows and columns is X, then the diagonal spacing is sqrt(2)*X). For equidistant spacing, points are placed at vertices of a grid made with equilateral triangles. This python script calculates such a grid of points and returns the data as a KML file containing latitude/longitude coordinates in decimal degrees.

The KML file may be displayed using Google Earth.

example output

Using Scrapy to Extract Scientific Names from PestNet Fact Sheets

PestNet serves a couple of hundred excellent pest fact sheets on its site at http://www.pestnet.org/fact_sheets/index.htm. Unfortunately, these are indexed only by vernacular names. I want to hyperlink to sheets using scientific names. So I wrote a scrapy script to crawl the site and scrape the section from each page which contains scientific names.

from scrapy.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
import re

class someSpider(CrawlSpider):
    name = 'crawltest'
    allowed_domains = ['www.pestnet.org']
    start_urls = ['http://www.pestnet.org/fact_sheets/index.htm']
    rules = ( Rule(LinkExtractor(allow=()), follow=True, callback='parse_item'), )


    # Let's capture scientific names
    def parse_item(self, response):
        log = 'scientific_names_log.md'
        s = str(response.body)
        searchObj = re.search( r'<a name="Scientific Name"(.*?)<a name=', s, re.M|re.I)
        if searchObj:
            result = searchObj.group(1)
        else:
            result = "Nothing found!!"
        with open(log, 'a') as f:
            f.write('{} {}\n'.format(response.url, result))
        return

The script was invoked from the terminal using:

scrapy runspider scrapePP.py -s DEPTH_LIMIT=1

Here are the first few lines from scientific_names_log.md:

http://www.pestnet.org/fact_sheets/mini/index.htm Nothing found!!
http://www.pestnet.org/fact_sheets/batiki_blue_grass_eye_spot_207.htm ></a><h1 class="" style="False">Scientific Name</h1><P><EM>Curvularia ischaemi</EM></P>
http://www.pestnet.org/fact_sheets/bean_pod_borer_037.htm ></a><h1 class="" style="False">Scientific Name</h1><P><EM>Maruca</EM> <EM>vitrata</EM>; it used to be known as <EM>Maruca testulalis.</EM></P>
http://www.pestnet.org/fact_sheets/bean_lace_bug_253.htm ></a><h1 class="" style="False">Scientific Name</h1><P><EM></EM>&nbsp;<EM>Corythucha gossypii</EM></P>
http://www.pestnet.org/fact_sheets/bean_phaseolus_rust_217.htm ></a><h1 class="" style="False">Scientific Name</h1><P><EM>Uromyces appendiculatus</EM> \r\nvar. <EM>appendiculatus.</EM> Previously <EM>Uromyces \r\nphaseoli.</EM> </P>

As you can see, there was still some work to be done to clean this up. I used the atom editor to delete the extraneous bits and saved urls and scientific names to pacific_pests_insects.csv. Note that this file contains info only on arthropod pests.

Finding Lost Parents

Finding Missing Parents in a Child-Parent Self-Referencing Table

In [1]:
import pymysql
from sqlalchemy import *
import getpass

Pattern: Find items in one list which does not occur in a second list

In [2]:
# To find parent_tids which do not have corresponding tids,
# convert lists to sets and calculate the difference.

tid = ["a", "b", "c", "d", "e"]
parent_tid = ["a", "f", "c", "m"]
list(set(parent_tid) - set(tid))
Out[2]:
['m', 'f']

Repeat using data from database

In [3]:
password = getpass.getpass('Database password: ')
Database password: ········
In [5]:
s = 'mysql+pymysql://aubreymoore:{}@localhost/pestlist'.format(password)
db = create_engine(s)
In [6]:
rs = db.execute("select tid, parent_tid from taxon2;")
tid_list = []
parent_tid_list = []
for r in rs:
    tid_list.append(r.tid)
    parent_tid_list.append(r.parent_tid)
In [7]:
tid_list[:10]
Out[7]:
['6',
 '7707728',
 '196',
 '1169',
 '7683',
 '2766430',
 '2766636',
 '220',
 '407',
 '6688']
In [8]:
parent_tid_list[:10]
Out[8]:
['#',
 '6',
 '7707728',
 '196',
 '1169',
 '7683',
 '2766430',
 '7707728',
 '220',
 '407']
In [9]:
missing_parents = list(set(parent_tid_list) - set(tid_list))
missing_parents
Out[9]:
['2002379', '#', '1890281']

Results

There are two missing parent_tids. Note that '#' is the parent of root nodes, so this is OK.

A constraint needs to be added to the taxon2 table to prevent entry of a parent_tid which does not match an existing tid.

In [ ]:
 

Using the Species API to Mine the GBIF Backbone Taxonomy

Here's how to get tons of info from GBIF.

Check out the Species API.

First step is to locate the taxon in the GBIF Backbone Taxonomy. The GBIF Backbone Taxonomy, often called the Nub taxonomy, is a single synthetic management classification with the goal of covering all names GBIF is dealing with.

You can search GBIF manually by going to http://www.gbif.org/species and entering a scientific name or common name. The magic number you are looking for is the GBIF ID.

Alternatively, you can use the Species API: http://api.gbif.org/v1/species/search?q=lawn%20armyworm&rank=SPECIES. This will return info encoded as JSON. The magic number in this case is nubKey.

Once we have the GBIF ID, which is 5109848 for Spodoptera mauritia, we can harvest more data:

Name usage: http://api.gbif.org/v1/species/5109848
Synonyms: http://api.gbif.org/v1/species/5109848/synonyms
Vernacular names: http://api.gbif.org/v1/species/5109848/vernacularNames
Media: http://api.gbif.org/v1/species/5109848/media
References: http://api.gbif/v1/species/5109848/references
Distributions: http://api.gbif.org/v1/species/5109848/distributions
Descriptions: http://api.gbif.org/v1/species/5109848/descriptions

Setting Up an Online Weather Station

Was asked to set up an on-line weather station for the University of Guam's Agricutural Experiment Station at Yigo, Guam. The weather station is a Davis Vantage Pro 2 Plus on a mast mounted on the roof of a 40 foot shipping container. The container is equipped with internet access via a cable modem. Data is fetched from the weather instruments using a Davis Vantage Pro 2 console fitted with an optional data logger.

Decided to use a Raspberry Pi 3 to read data from the weather station console via USB. Data is stored using weewx software, which also takes care of sending it to Weather Underground. The RP is connected to the modem by a cable.

Step 1: Install weewx on RPi

Step 2: Remove the Fake Clock on the RPi

Following the suggestion in https://github.com/weewx/weewx/wiki/Raspberry-Pi, I deleted the fake hardware clock from the RPi:

$ sudo apt-get purge fake-hwclock

This forces weewx to wait until a software clock is set from the internet connection before resuming after a power outage.

Step 3: Create a Weather Underground Personal Weather Station and Configure weewx to Feed It

Followed these directions: https://publiclab.org/notes/amysoyka/06-20-2014/how-to-set-up-your-weather-station-and-stream-it-to-the-internet

WU assigned KYIGO4 as the weather station ID.

Step 4: Access Weather Station Online

The weather station is online at: https://www.wunderground.com/personal-weather-station/dashboard?ID=KYIGO4

This page automatically updates every few seconds.

Weather Underground PWS KYIGO4

Notes

All outdoor sensors were replaced on April 21, 2017 because the temperature/humidity senor was not working.

Install web2py in a Conda Virtual Environment

Here's how to get web2py installed in a virtual environment using conda instead of virtualenv.

Info on conda environments is available here: https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/

Step 1. Create a directory for our web2py project.

cd Devel
mkdir playpen
cd playpen

Step 2. Create a virtual environment and activate it.

Note that web2py runs under python 2, not python 3.

conda create -n playpen python=2.7
source activate playpen

Step 3. Download and install web2py.

wget https://mdipierro.pythonanywhere.com/examples/static/web2py_src.zip
unzip web2py_src.zip
rm web2py_src.zip

Step 3. Install pygraphwiz (used to visualize models).

pip install pygraphviz

Step 4 -- Open web2py in a browser

(playpen) aubreymoore@aubreymoore-Aspire-7750Z:~/Devel/playpen/web2py$ python web2py.py

Step 5 -- Open web2py in a python shell

(playpen) aubreymoore@aubreymoore-Aspire-7750Z:~/Devel/playpen/web2py$ python web2py.py -S pestlist/default -M

Step 6. At the end of a session, the virtual environment can be deactivated.

source deactivate playpen

Step 7. To undo everything:

conda remove -n playpen --all
cd ..
rm -rf playpen