Post by Luc Vincent, Uber Tech Lead
We wanted to let you all know that a few months ago we quietly released - or actually re-released - an Optical Character Recognition (OCR) engine into open source. You might wonder why Google is interested in OCR? In a nutshell, we are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing.
This particular OCR engine, called Tesseract, was in fact not originally developed at Google! It was developed at Hewlett Packard Laboratories between 1985 and 1995. In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. However, shortly thereafter, HP decided to get out of the OCR business and Tesseract has been collecting dust in an HP warehouse ever since. Fortunately some of our esteemed HP colleagues realized a year or two ago that rather than sit on this engine, it would be better for the world if they brought it back to life by open sourcing it, with the help of the Information Science Research Institute at UNLV. UNLV was happy to oblige, but they in turn asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)... We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source.
A few things to know about Tesseract OCR: for now it only supports the English language, and does not include a page layout analysis module (yet), so it will perform poorly on multi-column material. It also doesn't do well on grayscale and color documents, and it's not nearly as accurate as some of the best commercial OCR packages out there. Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other Open Source OCR package out there. If you know of one that is more accurate, please do tell us!
We are grateful to all the people at HP who made it possible to release Tesseract into open source, and especially John Burns, who championed and babysat the project. We would also like to thank the original Tesseract development team, a partial list of whom is here. Last but not least, many thanks to our friends at UNLV's ISRI, including Tom Nartker, Kazem Taghva, Julie Borsack and Steve Lumos, for all their help with this project.
By the way, we are also hiring top-notch OCR engineers! See this job posting for more information.
Wednesday, August 30, 2006
Monday, August 28, 2006
Snakes on a Sprint
Several Python developers came together at both Google's Mountain View and New York offices last week for a bi-coastal Python Sprint. Our stalwart sprinters worked on everything from enhancements to the Python-3000 interpreter to triaging bugs and improving unicode testing for Python 2.5/2.6. If you'd like to learn more, check out Guido van Rossum's Python Sprint Report.
Wednesday, August 23, 2006
Crossing The Ubucon
Following on from last week's Linux World convention in San Francisco, Google hosted The Ubucon, an informal conference for Ubuntu hackers, enthusiasts and professionals. We had about 70 members of the Ubuntu community, from novice users to Cannonical staff members, in for presentations, general discussion of Linux and FLOSS, and, of course, why Ubuntu rocks. Check out The Ubucon Blog, Corey Burger's write up of the conference, or the Ubuntu community page to learn more about what's shaping up to be an annual event at the Googleplex. Our thanks go out to John Mark Walker, the conference organizer, and the community for coming together to make The Ubucon such an awesome event.
Tuesday, August 22, 2006
New GData API: Google Base
Post by Matthias Zenger, Software Engineer
We're excited to announce the availability of the Google Base data API, which lets you write applications that dynamically interact with Google Base. You can insert, edit, or delete items programmatically, complementing existing input means like the Google Base front-end or the bulk upload mechanism. You can also query other users' published content and access their items via the API. This enables you to create domain-specific search applications (or mash-ups) combining Google Base content with other services.
The API is ReST-full and is based on the GData protocol; see the Developer Guide for detailed information about its functionality and use. Also see the interactive demo app for more usage examples.
We're excited to announce the availability of the Google Base data API, which lets you write applications that dynamically interact with Google Base. You can insert, edit, or delete items programmatically, complementing existing input means like the Google Base front-end or the bulk upload mechanism. You can also query other users' published content and access their items via the API. This enables you to create domain-specific search applications (or mash-ups) combining Google Base content with other services.
The API is ReST-full and is based on the GData protocol; see the Developer Guide for detailed information about its functionality and use. Also see the interactive demo app for more usage examples.
Monday, August 21, 2006
Code on the Road: The Google Developers Event Calendar
Post by Paul McDonald, Product Manager
Today we're announcing the launch of the Google Developers Event Calendar! You can use it to see a schedule of upcoming developer events where Google employees will be speaking about open source, Google APIs, and all things code.
Most users will find it easiest to add the Developers Event Calendar to their own Google Calendars. To do so, simply follow these steps below. If you have any questions about using Google Calendar, please refer to the Google Calendar Help Center.
If you'd prefer to view the Google Developers Event Calendar in another application such as a feed reader or a product that supports the iCal format (like iCal for Mac), please click the relevant link here to obtain the URL:
Today we're announcing the launch of the Google Developers Event Calendar! You can use it to see a schedule of upcoming developer events where Google employees will be speaking about open source, Google APIs, and all things code.
Most users will find it easiest to add the Developers Event Calendar to their own Google Calendars. To do so, simply follow these steps below. If you have any questions about using Google Calendar, please refer to the Google Calendar Help Center.
- Click this button:
- If necessary, log into Google Calendar. Note: Logging into Google Calendar requires a Google Account. If you use Gmail, Google Groups, or other Google services, you already have a Google Account. Simply use the same login and password.
- Once you are logged in, choose Yes, add this calendar to add the Google Developers Event Calendar to your Calendar. You should now see Google Developers Event Calendar events listed on your Google Calendar.
If you'd prefer to view the Google Developers Event Calendar in another application such as a feed reader or a product that supports the iCal format (like iCal for Mac), please click the relevant link here to obtain the URL:
Friday, August 18, 2006
Google Desktop Developer Update
Today's Google Desktop update has a bit of news for Desktop Developers:
- The designer is now available in French, Italian, German, Spanish, Simplified Chinese and Japanese, and can be downloaded as part of the Google Desktop SDK
- Contest submissions are being reviewed and will be announced September 5th!
Thursday, August 17, 2006
Landing in Las Vegas
Post by Peter Deng, Product Marketing Manager
Come celebrate 40 years of Star Trek at the 5th Annual Official Star Trek Convention -- and while you're at it, learn more about Google APIs. Our API teams will be on hand at the confab in Las Vegas today through Sunday. Besides unveiling KML support for Google Maps for mobile, we'll be doing live demos of Google Earth KML, the Google AJAX Search API, Google Calendar's data API, and the Google Gadgets API.
Hope to see you there, preferably in a uniform. But if you can't appear in person, just transport yourself.
Come celebrate 40 years of Star Trek at the 5th Annual Official Star Trek Convention -- and while you're at it, learn more about Google APIs. Our API teams will be on hand at the confab in Las Vegas today through Sunday. Besides unveiling KML support for Google Maps for mobile, we'll be doing live demos of Google Earth KML, the Google AJAX Search API, Google Calendar's data API, and the Google Gadgets API.
Hope to see you there, preferably in a uniform. But if you can't appear in person, just transport yourself.
Monday, August 14, 2006
coolApp = new myCreativity(mapsAPI, searchAPI);
Post by Mark Lucovsky, Software Engineer
The Google Ajax Search API is designed to work seamlessly with the Google Maps API. One way it adds instant value is to allow your Maps-based applications to execute a search, then take the search results and plot them on a map. Our model for this is simple and straightforward — each search result is a JavaScript object that contains a number of properties including a URL, title, array of phone numbers, street address, city, latitude and longitude, etc. Therefore, adding a search result to a map is as simple as:
(An example of this search-enhanced Maps mashup idea is the Google Gadget we built using these two APIs — check out the new Google Map Search Gadget.)
Related post: Add Map Search to your site
The Google Ajax Search API is designed to work seamlessly with the Google Maps API. One way it adds instant value is to allow your Maps-based applications to execute a search, then take the search results and plot them on a map. Our model for this is simple and straightforward — each search result is a JavaScript object that contains a number of properties including a URL, title, array of phone numbers, street address, city, latitude and longitude, etc. Therefore, adding a search result to a map is as simple as:
var latLng = GLatLng(parseFloat(result.lat), parseFloat(result.lng));The AJAX Search API team produced a number of simple sample applications to teach the basics of search-integrated Maps mashups. Two of the most popular samples are My Favorite Places and My Phone List, so take a look and see if they inspire you to add Search to your Maps mashups!
var marker = new GMarker(latLng);
myMap.addOverlay(marker);
(An example of this search-enhanced Maps mashup idea is the Google Gadget we built using these two APIs — check out the new Google Map Search Gadget.)
Related post: Add Map Search to your site
Friday, August 11, 2006
Google Web Toolkit Update
The Web Toolkit team just released a huge update, here are a few of its new/updated features:
- Localization - Easily localize strings and formatted messages
- XML - An XML library based on the W3C DOM
- JSON - JSON is moving into
gwt-user.jar
, and it's much faster than 1.0.21 - FileUpload widget - The much-requested file upload widget
- FormPanel widget - Easily submit traditional HTML forms from GWT apps
- JUnit enhancements - Unit tests are much, much faster than 1.0.21, and you can now test RPCs and timers
- Automatic resource injection - Modules can contain references to external JavaScript and CSS files, causing them to be automatically loaded when the module itself is loaded
gwt-servlet.jar - Deploy this jar to add RPC to your servlet-based webapps without having to manually crack open
gwt-user.jar
to remove the servlet API classes- Javadoc-style API documentation
- Better, automatic management of browser caching of the
.nocache.html
and.cache.html
files for your module
Monday, August 7, 2006
Google Maps API Tutorial
Developer.com is running an outstanding series of tutorials on the Google Maps API, check 'em out:
[via the excellent Inside Open Source blog, one of my new favorites]
- Part 1: Integrating Google Maps into Your Web Applications
- Part 2: Retrieving Map Location Coordinates
- Part 3: Building a Geocoding Web Service
- Part 4: Build Your Own Geocoding Solution with Geo::Coder::US
- Part 5: Introducing Google's Geocoding Service
- Part 6: Performing HTTP Geocoding with the Google Maps API
[via the excellent Inside Open Source blog, one of my new favorites]
Thursday, August 3, 2006
Project Hosting 'R' Us
Last Thursday at the O'Reilly Open Source Conference, we announced availability of project hosting on Google Code; our goal is to provide a service to help foster innovation and support Open Source projects through simple, easy-to-use and reliable tools.
Currently, there are several thousand projects underway and we're very pleased with the enthusiasm shown by the Open Source community. So thank you for your support, especially those of you providing valuable feedback. If you haven't created a project, give it a try. We look forward to incorporating your feedback too.
For more information please take a look at our FAQ or join in the discussion on Google Groups.
Currently, there are several thousand projects underway and we're very pleased with the enthusiasm shown by the Open Source community. So thank you for your support, especially those of you providing valuable feedback. If you haven't created a project, give it a try. We look forward to incorporating your feedback too.
For more information please take a look at our FAQ or join in the discussion on Google Groups.
Wednesday, August 2, 2006
Google Gadget Guru
Posted by: Jessica Ewing, Product Manager
We've seen a lot of great gadgets created for the Google homepage recently. Topping the list of our Top Gadget Developers is Caleb Eggensperger. Caleb is a 16 year-old student at the Arkansas School for Mathematics, Sciences & the Arts. He's famous for his Countdown gadget, which our users are crazy about. Go figure.
We recently welcomed Caleb to our campus, where he met with our team to discuss his ideas around the personalized homepage as well as our Gadget API. Our conclusion was that Caleb is a genius and is going to take over the world some day so we made sure to get on his good side. We took him to lunch, played a round of glow-in-the-dark mini golf, and arranged a run-in with Marissa Mayer. Congratulations, Caleb.
Want to try to knock Caleb out of the top spot and get invited to Google next year? Write some cool gadgets for the homepage.
We've seen a lot of great gadgets created for the Google homepage recently. Topping the list of our Top Gadget Developers is Caleb Eggensperger. Caleb is a 16 year-old student at the Arkansas School for Mathematics, Sciences & the Arts. He's famous for his Countdown gadget, which our users are crazy about. Go figure.
We recently welcomed Caleb to our campus, where he met with our team to discuss his ideas around the personalized homepage as well as our Gadget API. Our conclusion was that Caleb is a genius and is going to take over the world some day so we made sure to get on his good side. We took him to lunch, played a round of glow-in-the-dark mini golf, and arranged a run-in with Marissa Mayer. Congratulations, Caleb.
Want to try to knock Caleb out of the top spot and get invited to Google next year? Write some cool gadgets for the homepage.
MarkL on the AJAX Search API
Chris saw this email go by on an internal thread and thought it'd be great to re-post here; it's a note from Mark Lucovsky to James Atkinson (of phpBB), regarding the recently-released AJAX Search API:
---------- Forwarded message ----------
From: Mark Lucovsky
Subject: Re: Howdy from Google.
To: James Atkinson
Cc: Chris DiBona
James,
Thanks for getting back to me.
I am not sure if you have seen our latest api? Documentation and samples are at http://code.google.com/apis/ajaxsearch/
Its a classic mashup API that lets you easily add search to your site, but we have done this with a twist... We make it VERY VERY easy to remember or "clip" a search result onto your page.
Why did we do this?
We observed countless interactions in email and message boards where a question is being posed, e.g., "Does anyone know of a good Sushi place in Santa Barbara?", or, "What kind of fancy new camera where you using at the game the other day?", or, "I am thinking of putting Campy Compact Cranks on my bike. Do you think this is a good idea?", or, "We just stayed at The St. Francis in San Francico and had a great time?"
Often times, the most accurate way to answer or add value to these discussions is with a search result. When responding to the Sushi question, a Google Local search result provides the name of the restaurant, the address, its phone number, as well as a link to the landing page on Google Maps. The result also contains the lat/lng coordinates so that if you have a map available, plotting the result on a map is trivial.
When developing the initial mockups and ideas for this API we built a very powerful demonstration, based on phpBB. What I did was change phpBB to include our little search control and made it possible to include search results into a post.
The changes to enable this were trivial... All I had to do was change the
I have included two screen shots. The first is a reply to a post about Sushi places near Google. Note that the reply contains to local search results.
Clicking on the title brings you to a Google landing page.
Obviously, I could have left phpBB, looked up the Akane in Google Local, futzed around a little to get local to produce a url, and then paste the URL into the response. This, in my opinion, represents, "The Old Way"... something that only the tech savvy can master. In the real world, cut/paste, mastering multiple windows, are not skills that we can or should take for granted.
With our search control, seamlessly integrated into phpBB, I type "Akane" into a search box, then click the "copy" button. The resulting post content content is shown in the first attachment.
The second screenshot shows the editing experience. I took 300px to the right of the compose form and added in our search control. Its very simple to use and fits in very nicely with the rest of your app.
When I show this demo to people, they all instantly "got it" and understood how much more valuable message board interaction could be when search results are a click away. Now granted, this isn't something that everyone would use in every single post, BUT I think everyone who saw this felt that this is the kind of thing that they would definitely use once a day in either an email, blogging, or message board environment.
I had never seen the phpBB code before. I simply unzipped it, set up a database, and within an hour, had found the three or four touch points that I had to edit in order to enhance it with this new capability. I think it dropped in very easily and naturally. It would be very cool to see this out in the wild, and I would be more than willing to help you guys get up and running, get started, whatever you need.
Let me know what you think.
-markl
---------- Forwarded message ----------
From: Mark Lucovsky
Subject: Re: Howdy from Google.
To: James Atkinson
Cc: Chris DiBona
James,
Thanks for getting back to me.
I am not sure if you have seen our latest api? Documentation and samples are at http://code.google.com/apis
Its a classic mashup API that lets you easily add search to your site, but we have done this with a twist... We make it VERY VERY easy to remember or "clip" a search result onto your page.
Why did we do this?
We observed countless interactions in email and message boards where a question is being posed, e.g., "Does anyone know of a good Sushi place in Santa Barbara?", or, "What kind of fancy new camera where you using at the game the other day?", or, "I am thinking of putting Campy Compact Cranks on my bike. Do you think this is a good idea?", or, "We just stayed at The St. Francis in San Francico and had a great time?"
Often times, the most accurate way to answer or add value to these discussions is with a search result. When responding to the Sushi question, a Google Local search result provides the name of the restaurant, the address, its phone number, as well as a link to the landing page on Google Maps. The result also contains the lat/lng coordinates so that if you have a map available, plotting the result on a map is trivial.
When developing the initial mockups and ideas for this API we built a very powerful demonstration, based on phpBB. What I did was change phpBB to include our little search control and made it possible to include search results into a post.
The changes to enable this were trivial... All I had to do was change the
subSilver/overall_header.tpl
to include our stylesheet, and then subsilver/posting_body.tpl
to fire our control and process clip events, and serialize the clipped content on submit.I have included two screen shots. The first is a reply to a post about Sushi places near Google. Note that the reply contains to local search results.
Clicking on the title brings you to a Google landing page.
Obviously, I could have left phpBB, looked up the Akane in Google Local, futzed around a little to get local to produce a url, and then paste the URL into the response. This, in my opinion, represents, "The Old Way"... something that only the tech savvy can master. In the real world, cut/paste, mastering multiple windows, are not skills that we can or should take for granted.
With our search control, seamlessly integrated into phpBB, I type "Akane" into a search box, then click the "copy" button. The resulting post content content is shown in the first attachment.
The second screenshot shows the editing experience. I took 300px to the right of the compose form and added in our search control. Its very simple to use and fits in very nicely with the rest of your app.
When I show this demo to people, they all instantly "got it" and understood how much more valuable message board interaction could be when search results are a click away. Now granted, this isn't something that everyone would use in every single post, BUT I think everyone who saw this felt that this is the kind of thing that they would definitely use once a day in either an email, blogging, or message board environment.
I had never seen the phpBB code before. I simply unzipped it, set up a database, and within an hour, had found the three or four touch points that I had to edit in order to enhance it with this new capability. I think it dropped in very easily and naturally. It would be very cool to see this out in the wild, and I would be more than willing to help you guys get up and running, get started, whatever you need.
Let me know what you think.
-markl
Tuesday, August 1, 2006
Google Summer of Code Mid-Term Report
As part of the GSoC mid-term evaluations, we asked our mentors to give us a review of everything from their students' progress to date to their favorite color. We had a great session at OSCON 2006 where we shared the aggregate results of the surveys and some additional statistics for the program. For those of you who weren't able to attend, we've posted excerpted slides for your perusal.
And since you'll no doubt be wondering, our mentors overwhelmingly prefer blue in all its various hues.
And since you'll no doubt be wondering, our mentors overwhelmingly prefer blue in all its various hues.
Subscribe to:
Posts (Atom)