badsegue.org

April 4, 2005

Google and MSN GPX GPS Waypoint Extraction

Category: GPS,Software — badsegue @ 8:51 am

background

Buy GPS Stuff

Previously we showed how to pull GPS waypoints from MSN Yellow Pages and Google Maps results using a Firefox bookmarklet. Now this can be done with Internet Explorer as well. The original solution was inadequate because of the IE limit on the length of bookmarks.

approach

To support IE the bookmarklets must be implemented using the ‘script injection’ technique. This involves rewriting the head of the HTML page to add a reference to an external script. The external script can be as long as needed. The drawback is that the external script must be fetched before the bookmarklet can do its job. This adds a small delay, and a dependency on the site hosting the script. So for Firefox users the standalone bookmark is probably preferable.

implementation

The new bookmarklets simply contain the code necessary to inject the external script into the page. Just right-click or drag them to add to your favorites/bookmarks. To use them visit the appropriate page and select the bookmark.

Here is the new MSN bookmarklet, and the code:

javascript:
(function(){
  var script=document.createElement('script');
  script.src='http://badsegue.org/samples/msngpxrip.js';
  document.getElementsByTagName('head')[0].appendChild(script);
}
)()

Here is the new Google bookmarklet, and the code:

javascript:
(function(){
  var script=document.createElement('script');
  script.src='http://badsegue.org/samples/googgpxrip.js';
  document.getElementsByTagName('head')[0].appendChild(script);
}
)()

The code of the actual extractor has changed a little bit. The Google extractor has to account for DOM differences to get to the data. Notice that the function is invoked at the end of the script, and not in the bookmarklet. This ensures that the function is defined by the time it is called.

function googrip(){
  var t;
  if (document.vp && (t=document.vp.document.scripts[0].text))
  {} else if (document.getElementById('vp') && (t=document.getElementById('vp').
contentDocument.getElementsByTagName('SCRIPT').item(0).text))
  {} else { alert ("Nothing found here.  Are you at maps.google.com? You are at
" + document.location); return(0); }
  var pts=t.match(//g);
  if (pts) {
  ... extraction code removed.  see previous articles...
  } else { alert ("Nothing found here.") }
}
googrip();

caveats

These bookmarklets may or may not work with Firefox. It does work for me when I use Firefox from home, but not at work. Theoretically these particular scripts shouldn’t work, because they attempt to open a new browser window. According to the Mozilla documentation, scripts or pages loaded from one domain can’t manipulate pages loaded from another domain. So if you encounter problems using these scripts, just use the ones shown in the earlier articles.

Comments Off

• • •

March 21, 2005

Photomosaics and a Google Image Grabber

Category: Photo,Software — badsegue @ 1:45 am

background

You’ve probably seen photomosaics before.

Mosaic (scaled way down)

Original image (full size)

These are images that are composed of other images. There are several free/cheap programs out there that can take a given picture and make a mosaic using a set of pictures of your choosing. The one I’ve used and had great results with is AndreaMosaic. It’s free and easy to use.

Here’s a sample mosaic I made. This is a scaled down version of the original, which is around 8MB. The original image is only 70×70, and the component images are thumbnail sized, around 100×100.

You don’t need high resolution images to make a mosaic, but the resulting image can have enough detail to produce poster sized prints. I’ve made 24×30 prints using nothing more than low resoultion source image and a bunch of thumbnails.

Once you figure out the basic approach and dimensions needed for the final images, you can be producing mosaics in a matter of minutes. The hardest part is coming up with enough feeder images to give the mosaic enough

approach

If you’ve got hundreds or thousands of images and you want to use those in the mosaic then you may not need to find any more feeder images. I like to use images related to the original’s subject matter, rather than just any image (although that can be interesting as well). So for the holiday dog picture I wanted holiday and dog pictures. The natural place to look was Google Images. You can search on anything and find any number of relevant images, and in thumbnail size from the results page. Since thumbnails are the perfect size for feeding into the mosiac there is no need to go to the host page and download the full-size version.

implementation

This Perl program takes a search term and a range, then fetches the matching images from Google Images. It saves them into a folder with the same name as the search term, in your current directory. The images are saved using the URL of the image, so if you re-run the search it won’t fetch an image it’s already stored.

Usage: get.pl

 [start range] [end range]
search term: This is the query string passed to Google Images.  It can be whatever you want, but if it is more than one word then you have to put the term in quotes.  You can use the Google query language, like "flower AND rose", "rose -wine", etc.

start range: The starting index to retrieve.  Google returns 20 images per page, so this will start retrieving the page that contains the start range image.

end range: The ending index to retrieve.  The program will stop once it retrieves the page that contains the end range image.

Use start-end to control how many images to fetch. Usually you will just do something like

get.pl "flower" 0 100

If you later wanted to get more images of that type you can do

get.pl "flower" 100 500

This will avoid the images you’ve already retrieved and save you some time.

Because you’re only downloading the thumbnails the program is usable even on dial-ups.

use HTML::Parser;
use HTTP::Request::Common;
use LWP;
use URI::Escape;

use strict;

$|=1;

my $client = LWP::UserAgent->new(agent=>'Mozilla', timeout=>'0', keep_alive=>1);
my $ua    = "Mozilla";
my $in    = "./";
my $query = shift; chomp($query);
my $start_idx = shift; chomp($start_idx);
my $end_idx = shift; chomp($start_idx);
my $url   = "http://images.google.com/images?q=$query+filetype:jpg\&safe=off";
my $start = $start_idx || "0";
my $stop = $end_idx || 0;
my $dest_dir = "$in/" . uri_escape ($query);

my $count = 1;

my $p = new HTML::Parser (
 api_version => 3,
 start_h     => [\&tag, "tagname, attr"],
);

print "Start = $start, Stop = $stop, Query = $query\n";
mkdir $in || die "Couldn't make $in ($!)\n";
mkdir $dest_dir || die "Couldn't make $dest_dir ($!)\n";


while (1) {
  my $test = $start;
 
  # Get the search results page
  my $request = HTTP::Request->new('GET', "${url}\&start=${start}");
  my $response = $client->request($request);
  
  $p->parse( $response->content );
  # See if we are out of images
 if ($test == $start || ($stop && ($start >= $stop))) {
  print "Done.\n";
  exit 0;
 }
}

sub tag {
  my ($tagname, $attr) = (@_);

  # Found the next page graphic, increment counter to continue grabbing
  if ($attr->{'src'} && ($attr->{'src'} eq "/nav_next.gif" )) {
        $start += 20;
  }

  return unless ($tagname eq 'img');
  return unless ($attr->{'src'} && $attr->{'src'} =~ /images\?q=tbn:.*\.jpg/i);
  my $filename = $attr->{'src'};
  $filename =~ s/\/images\?q=tbn:.*://;
  $filename = uri_escape($filename);

  if (-e "$dest_dir/$filename") {
    print "Skipping ";
  } else {
    my $request = HTTP::Request->new('GET', "http://images.google.com$attr->{'src'}");
    my $response = $client->request($request, "${dest_dir}/${filename}");
  }
  print "$filename (", $count++, ")\n";
}

Comments Off

• • •

March 20, 2005

Google Maps GPS GPX Waypoint Extractor

Category: GPS,Software — badsegue @ 1:17 am

background

I previously wrote about how to extract waypoints and create a GPX file from MSN Yellow Pages using a bookmarklet. This article explains how to do the same thing for Google Maps. I use mine to supplement the points of interest (POIs) in North American CitySelect v5 for my Garmin 76C.

POI coverage can be spotty, even in well established and stable cities. In newly developed or more remote areas there may be nothing at all. Online directories should have just about every business that would appear in the Yellow Pages, and are much more timely than the software releases. By tapping these online resources you can have the most accurate and complete set business POIs possible.

approach

This can be done relatively easily because of the way the search results are contained on a single place with the relevant data unencoded. Yahoo and most of the other online providers lack this ease of access. Even Google Local doesn’t put all the information on a single page, you’d have to drill down into each returned place to get the coordinates.

implementation

The code looks like this:

javascript:
(function(){
  var t=document.getElementById('vp').
        contentDocument.getElementsByTagName('SCRIPT').item(0).text;
  var pts=t.match(/<point .*?<\/title>/g);
  var doc=open().document;
  var bod=doc.body;
  doc.write('<textarea rows=%2250%22 cols=%22100%22>');
  doc.write('\n<gpx xmlns=%22http://www.topografix.com/GPX/1/1%22 
             creator=%22gpxextr%22 version=%221.1%22 

             xmlns:xsi=%22http://www.w3.org/2001/XMLSchema-instance%22>');
  for(i=0;i<pts.length;i++){
    var latlon = pts[i].match(/(-?\d{2}\.\d{6}).*?(-?\d{2}\.\d{6})
    .*?title.*?>(.*?)<\/title>/);
    latlon[3] = latlon[3].replace(/<.*?>/g, '');
    doc.write('\n<wpt lat=%22', latlon[1], '%22 lon=%22', latlon[2],
    '%22>\n<name>', latlon[3], '</name>\n</wpt>');
  }
  doc.write('\n</gpx></textarea>');
  doc.close();
}
)()

The Google Maps GPX Waypoint Extractor link (Firefox only) will run a little Javascript bookmarklet that parses the interesting parts of the map link and write them as a GPX file into a new browser window. (The MSN and Google extractors only works in Firefox/Mozilla for now. IE limits the the length of a bookmark and right now it’s too long. There’s a way around this but it requires putting the code into a file and having the bookmarklet ‘inject’ it into the page. I haven’t got that working yet though.)

If you click on the link, the script won’t find anything on this page. Add the link (right-click and add it, or drag the link to your toolbar) then open Google Maps and do a search for “Pizza Duck,NC”. Now select the GPS GPX Extractor bookmark and you should get waypoints for all the results on the page. You should be able to import that file into most software that manages waypoints.

etc…

MSN Yellow Pages and Google Maps are the only sites I know of that have easily parsed coordinates. I think Google Local searches can be tapped as well, but it would require fetching each detail page. There are online GIS sources that can be used to get other types of waypoints, like parks and such. If you know of other sources of information that can be extracted like this, let me know.

Comments Off

• • •

March 14, 2005

MSN GPX GPS Waypoint Extractor

Category: GPS,Software — badsegue @ 1:09 am

background

Buy GPS Stuff

If you have a mapping GPS you may have noticed that the map sets you have are missing lots of points of interest. While planning the annual trip to the beach I noticed that there were no POIs for the area in City Select North America v5. This is the map set I use on my Garmin 76C. The only update available is the next version, which I can get for $75, but is unlikely to be much better.

What I needed was a simple way to get accurate and up to date POIs for any area that I know I’m traveling to.

approach

Obviously if you’re looking for a type of business somewhere, you’re going to just search online. There are plenty of options that provide yellow pages, and can serve maps and driving directions for any given place.

I don’t need the maps or directions–the GPS handles that. I just need the latitude and longitude, preferably with multiple results on a single page. I don’t want to have to drill down to a different page for each returned result to get the coordinates.

A quick survey turned up MSN and Google as the most promising candidates. Google has their Local search, and the Maps beta. The Local search can return the largest number of results, but the coordinates in the page are for the center of the search region, not the results. The Maps search (dissected here) has the right data, but is limited to 10 results and I don’t see any way to get any more. That leaves MSN, which has the right data and also has other useful search options, like the ability to search within a radius of a specific address.

implementation

MSN Yellow Pages produces result details like this

Each matched business has a map link which has the latitude, longitude, and name.

I started with a Perl script that took a search term, made the HTTP connection, and parsed the results. I did something similar to download thumbnails from Google image searches, for feeding into a photo mosaic. This approach works ok, but it would be nicer to have it tied to the browser results page without having to run an external script. That way I can tweak the search interactively until I find what I want. MSN searches sometimes return a category page which you have to go beyond to get to the link lists, so running directly from the browser is useful.

A bookmarklet works well for this job. Bookmarklets are bits of Javascript that have been saved as a bookmark. When activated they run in the context of the current page in the browser, as if they were part of the page itself.

This MSN GPX Waypoint Extractor link (Firefox only) will run a little Javascript bookmarklet that parses the interesting parts of the map link and write them as a GPX file into a new browser window. (The MSN and Google extractors only works in Firefox/Mozilla for now. IE limits the the length of a bookmark and right now it’s too long. There’s a way around this but it requires putting the code into a file and having the bookmarklet ‘inject’ it into the page. I haven’t got that working yet though.)

If you click on the link, the script will find the sample map link in this page, which isn’t that useful. Add the link (right-click and add it, or drag the link to your toolbar) then open the detailed search results of pizza places in Duck, NC. Now select the GPS GPX Extractor bookmark and you should get waypoints for all the results on the page.

The code looks like this:

javascript:
(function(){
  var i,x,h,n;
  var doc=open().document;
  var bod=doc.body;
  doc.write('<textarea rows=%2250%22 cols=%22100%22>');
  doc.write('\n<gpx xmlns=%22http://www.topografix.com/GPX/1/1%22
             creator=%22gpxextr%22 version=%221.1%22
             xmlns:xsi=%22http://www.w3.org/2001/XMLSchema-instance%22>');
  var links = document.getElementsByTagName('a');
  for(i=0;i < links.length; i++) {
    x=links[i];
    h=x.href;
    var latlon = h.match(/lat=([-\d]*)&POI1lng=([-\d]*)/);
    var nm = h.match(/POI1name=(.*?)&street/);
    if (latlon != null && !h.match(/^javascript/) && nm != null) {
      n = nm[1].replace(/\+/g, ' ');
      n = unescape(n);
      n = n.replace(/&/g, 'and');
      latlon[1] = latlon[1].replace(/0(\d\d)(\d*)/, '$1.$2');
      latlon[2] = latlon[2].replace(/0(\d\d)(\d*)/, '$1.$2');
      doc.write('\n<wpt lat=%22', latlon[1], '%22 lon=%22', latlon[2],
      '%22>\n<name>', n, '</name>\n</wpt>');
    }
  }
  doc.write('\n</gpx></textarea>');
  doc.close();
}
)()

There is a little bit of manipulation using regular expressions needed because the coordinates have a leading zero and no decimal.

Save the file and open it up in some program that understands GPX files, which should be most current versions of GPS map/waypoint software. I used G7toWin to make them all restaurants and send them to the 76C.

Here are the pizza places centered around Duck:

Here they are from the Find menu:

You can also open them up in USAPhotomaps and get the satellite views:
(Topo view)
(Sat view)

Comments Off

• • •

« Previous Page

badsegue.org • • • • •

April 4, 2005

Google and MSN GPX GPS Waypoint Extraction

background

approach

implementation

caveats

March 21, 2005

Photomosaics and a Google Image Grabber

background

approach

implementation

March 20, 2005

Google Maps GPS GPX Waypoint Extractor

background

approach

implementation

etc…

March 14, 2005

MSN GPX GPS Waypoint Extractor

background

approach

implementation