Angle brackets are a way of life
So, there’s a new kind of Android device in the world. The world still isn’t sure just where it is that tablets are the right tool for the job. That granted, this is a nifty product. And I’m developing my own theory of what tablets are for.
My impressions are based on a couple hours playing with one, which at this point is a couple hours more than almost anyone else. The model I played was not quite production — among other things, the product name stenciled on the back wasn’t “Galaxy Tab” — but close.
I won’t have one on next week’s trip to Mainz for MobileTech, but I’m pretty sure I’ll be able to take one along to GDD Tokyo and JAOO in Aarhus, Denmark.
Other coverage: At the Financial Times’ ft.com/techbog, also Android Central (with a useful iPad comparo), also Engadget.
All the apps I tried ran just fine, including a couple of immersive games that really benefited from the extra inches. I’ve heard of a few apps that misbehave, but their problems were obvious & easy to fix; watch for details over on the Android Dev Blog, starting later today.
Samsung has sprinkled some sugar on the out-of-the-box Google UI elements, and while the community’s opinions on hardware companies’ efforts to improve Android software have been, um, mixed (my own is extremely mixed), I have to say that the Samsungers have shown restraint, putting the extra real estate to good use in good places, for example the notifications pull-down. There may be some of that integrated-social-everything that frankly gets up my nose, but my nose remained clear around the Tab, so if it’s there it‘s at least easy to ignore.
It’s snappy, especially on games where that matters; maybe there are places where servicing the extra bits in the 1024x600 screen will hurt, but I didn’t run across them.
It’s got a phone but (at least on the pre-release model I used) you can’t hold it up to your head, which is a good thing as that would look supremely dorky.
Did I mention that the screen is beautiful? Also it feels really good in the hand and looks pretty nice, and is obviously in the first microsecond’s glance not an iPad.
The trade-off is obvious. You win because you can show a bigger picture, which is important, and you lose because it just won’t fit in many pockets, which is important. It’ll go in most purses, though.
I know what I’ll use the Galaxy Tab for: to show off Android. The big screen just makes everything easier to see and point at, and graphics look outstanding, and it passes from hand to hand easily. Showing off Android is part of my job and this will help me do my job better.
Which leads to a general theory, reinforced by informal observation of hipsters with iPads in coffee shops: a tablet is, crucially, a more shareable computer. A laptop, with its fragile hinge-ware and space-gobbling keyboard, is just not comfy to share. A tablet is easier to bring to the café, easier to hand across the table or along the sofa, easier to seize in the heat of the moment, easier to hold up in triumph, easier to set aside when you need to meet someone’s eyes.
How big a market is that? Anyone who says they know is lying.
Posted at 10:01
The XML Security Working Group has published five working drafts today. XML Signature 2.0, Canonical XML 2.0 and the XML Signature Streamable Profile of XPath 1.0 are part of an ongoing effort to rework XML Signature and Canonical XML in order to address issues around performance, streaming, robustness, and attack surface. The Working Group has also published updated Working Drafts for its XML Signature Best Practices and XML Security Relax NG Schemas Working Group Notes. Learn more about XML Security.
Posted at 20:52
Herein is my response to the question
what does law.gov mean to you?
I am an IT architect and a builder of legislative systems more so
than a direct legal publisher. Having said that, I have worked with
most of the worlds legal publishing entities at some time or other
over the last twenty years. My current focus is creating
legislative systems for legislatures - mostly in the U.S.A. - the
content our systems produce is then published by legislatures
themselves and also by third party publishers.
I am a technologist first and foremost. I recently started blogging
about the
KLISS eDemocracy system here in Kansas in the hope that the
technical details I am blogging will help other technologists to
understand the legislative domain better and thus help create a
more informed tech community around one of the most important
aspects of any democracy.
I agree with pretty much everything Ed Walters said about the
AOL Moment that is currently happening in the legal publishing
industry. I also also agree with pretty much everything Carl
Malamud says about the desirability of free, unfettered access to
authenticated, machine readable primary legal materials in the
context of the law.gov initiative.
For me however, the most interesting vista that law.gov opens up is
the potential for the most significant event in the evolution of
democracy since the funeral
oration of Pericles 2400 years ago. For the first time in human
history, we now have all the technological pieces we need to bring
participation in the democratic process to levels not seen since
ancient Greece when everyone could literally congregate in the same
place. To quote Don Heiman, CITO for the Kansas State
Legislature:
There are no longer any technical reasons why we cannot publish the
public activities of a legislature in real-time, or have statute
databases codified on the fly, or provide direct visibility of what
the impact of a proposed modification to the law would look like
before it gets voted on. No technical reason why we cannot allow
citizens to not only observe, but also participate in the making of
law *as it is being made* - not just see the results ex post
facto.
It is a lot of work for sure but it is only work at this
point. No new technology breakthroughs are required. What needs to
happen next (and there are signs it is happening) is for the world
of law and the world of software development to both come to the
realization that they are both in the same business from content
management and publishing perspectives. I really believe that law
is source code in the sense that the disciplines and techniques
that have been perfected in the software development world have a
tremendous amount to offer those who manage corpora of legal
texts.
I look forward to the day when we speak of, for example "release
7.8a (Rev 456422) of the consolidated statutes of Tumbolia (MD5:
checksum d03730288a7f0278e36afc82f220ddab)."
I look forward to the day when we can jump into a time machine and
look at Rev 674245 of the 2011 Legislative Biennium Corpus for
Tumbolia in order to better understand the legislative intent of an
amendatory bill.
I look forward to the day when we can look at the laws of Tumbolia,
as they were at noon Wed, 20 Jan 2010 in order to present attorneys
and the courts with a complete view of what the law said at the
time some contested action took place.
I look forward to the day when we can detail edit-by-edit how the
consolidated statutes of Tumbolia came to be what they are by
starting with the Constitution of Tumbolia from 1899 and rolling
forward changes to its statute from its session laws, step-by-step
with all the rigor of an accounting audit trail of transaction
ledgers.
I hope that the law.gov initiative heads in that direction. The
http://legislation.gov.uk
website clearly points the way for what is possible. Speaking as a
technologist, we techies stand ready willing and able to make this
happen. Is the political will there to make it happen? Is the
disruption of the status quo too much too soon for such a staid and
contemplative field as law and law-making? I can answer neither of
these questions but I sincerely hope the answers are "yes" and "no"
respectively.
The biggest threat to any democracy is a disinterested electorate.
In years to come, I hope law.gov will be seen as the catalyst that
re-invigorated an entire generation to engage with the democratic
process. A process that too many currently feel is beyond their
realm of influence. We can change that now. For our sakes and the
sakes of future generations, I hope we do.
Posted at 15:18
Norm Walsh has published a very interesting post to his blog,
Reconsidering
specialization, part the first.
This is very significant and I eagerly await Norm's thoughts.
As Norm relates in his post, he and I had what I thought was a very
productive discussion about specialization and what it could mean
in a DocBook context. I think Norm characterized my position
accurately, namely that the essential difference between DocBook
and DITA is specialization and that makes DITA better.
Here by "better" I mean "better value for the type of applications
to which DITA and DocBook are applied". It's a better value
because:
1. Specialization enables blind interchange, which I think is very
important, if not of utmost importance, even if that interchange is
only with your future self.
2. Specialization lowers the cost of implementing new markup
vocabularies (that is, custom markup for a specific use community)
roughly an order of magnitude easier.
There's more to it than that, of course, but that's the key
bits.
All the other aspects of DITA that people see as distinguishing:
modularity, maps, conref, etc., could all be replicated in
DocBook.
If we assume that DITA's more sophisticated features like maps and
keyref and so forth are no more complicated than they need to be to
meet requirements, then the best that DocBook could do is implement
the exact equivalent of those features, which is fine. So to that
degree, DocBook and DITA are (or could be) functionally equivalent
in terms of specific markup features. (But note that any statement
to the effect that "DITA's features are too complicated" reflects a
lack of understanding of the requirements are that DITA
satisfies--I can assure you that there is no aspect of DITA that is
not used and depended on by at least one significant user
community. That is, any attempt, for example, to add a map-like
facility to DocBook that does not reflect all the functional
aspects of DITA maps will simply fail to satisfy the requirements
of a significant set of potential users.)
But note that currently DocBook and DITA are *not* functionally
equivalent: DocBook lacks a number of important features needed to
support modularity and reuse. But I don't consider that important.
What really matters is specialization.
Note also that I'm not necessarily suggesting that DocBook adapt
the DITA specialization mechanism exactly as it's formulated in
DITA. I'm suggesting that DocBook needs the functional
equivalent of DITA's specialization facility.
Note also that DocBook as currently formulated at a content model
level probably cannot be made to satisfy the constraints
specialization requires in terms of consistency of structural
patterns along a specialization hierarchy and probably lacks a
number of content model options that you'd want to have in order to
support reasonable specializations from a given base.
But those are design problems that could be fixed in a DocBook V6
or something if it was important or useful to do so.
Finally, note that in DITA 2.0 there is the expectation that the
specialization facility will be reengineered from scratch. That
would be the ideal opportunity to work jointly to develop a
specialization mechanism that satisfied requirements beyond those
specifically brought by DITA. In particular, any new mechanism
needs to play well with namespaces, which the current DITA
mechanism does not (but note that it was designed before namespaces
were standardized).
Posted at 14:46
In recent days I’ve been thinking of JavaOne, as we kicked it around and decided we just couldn’t send speakers; and of Oracle OpenWorld, to which JavaOne will now serve as an appendage. It reminded me of a conversation I had last year about Oracle.
The conversation involved myself and a person with a convincing title who, as they’d say in the paper, was “familiar with the situation”.
My question was: “OpenWorld is this totally all-about-business conference. The Oracle Develop meeting is just a second-rate sidebar. Where does Oracle go about building developer mindshare?”
I’ll try to reproduce the answer in full as best as I can remember it:
“You don’t get it. The central relationship between Oracle and its customers is a business relationship, between an Oracle business expert and a customer business leader. The issues that come up in their conversations are business issues.
“The concerns of developers are just not material at the level of that conversation; in fact, they’re apt to be dangerous distractions. ‘Developer mindshare’... what’s that, and why would Oracle care?”
Posted at 22:33
Cancun and a day trip to the Riviera Maya brings me to country number 16.
Posted at 22:32
The Voice Browser Working Group has published a Working Draft of Voice Extensible Markup Language (VoiceXML) 3.0. Voice XML is used to create interactive media dialogs that feature synthesized speech, recognition of spoken and DTMF key input, telephony, mixed initiative conversations, and recording and presentation of a variety of media formats including digitized audio, and digitized video. Learn more about the Voice Browser Activity.
Posted at 19:33
It's been a few years since I first considered DITA specialization. I wonder if I missed the point? I think that might depend on the assumptions that I brought to the table.
Posted at 17:19
W3C is pleased to announce the creation of the HTML Speech Incubator Group, whose mission is to determine the feasibility of integrating speech technology in HTML5 in a way that leverages the capabilities of both speech and HTML (e.g., DOM) to provide a high-quality, browser-independent speech/multimodal experience while avoiding unnecessary standards fragmentation or overlap. The following W3C Members have sponsored the charter for this group: Voxeo, Microsoft, Openstream, Google, AT&T, Mozilla. Read more about the Incubator Activity, an initiative to foster development of emerging Web-related technologies. Incubator Activity work is not on the W3C standards track but in many cases serves as a starting point for a future Working Group.
Posted at 15:40
Dave Megginson (who drove the development of the SAX API that will be familiar to many XML developers who use Java) recently wrote Java is dead. Java stood out as a programming language (though not as a platform) in that...
Posted at 15:31
David Eaves : Creating effective
open government portals. Amen to that.
Here is the thing...most http://data.[whatever] websites are only
as good as their ability to serve up fresh content. That oftentimes
means that re-thinking back-end processes is required. Otherwise a
one-off data dump happens to get things rolling but then...
Nothing kills a web-o-data project so ruthlessly as information
latency.
Machine readable content - even more so than human readable content
- must be current.
Posted at 14:59
Here is another Project Euler problem that seems exactly what XSLT was not intended for:
Find the minimal path sum, in matrix.txt (right click and 'Save Link/Target As...'), a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by only moving right and down.
My solution:
|
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:saxon="http://saxon.sf.net/" xmlns:mx="my:my" exclude-result-prefixes="xs saxon mx" > <xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vMatrix" as="element()*"> <xsl:variable name="vLines" as="xs:string*" select="tokenize(unparsed-text('matrix.txt'),'\s+')[.]"/> <xsl:for-each select="$vLines"> <line> <xsl:for-each select="tokenize(.,',')"> <v><xsl:value-of select="."/></v> </xsl:for-each> </line> </xsl:for-each> </xsl:variable>
<xsl:variable name="vDimension" as="xs:integer" select="count($vMatrix)"/>
<xsl:template match="/"> <xsl:sequence select="mx:path-minSum(1,1,0)"/> </xsl:template>
<xsl:function name="mx:path-minSum" as="xs:integer" saxon:memo-function="yes"> <xsl:param name="pX" as="xs:integer"/> <xsl:param name="pY" as="xs:integer"/> <xsl:param name="pcurSum" as="xs:integer"/>
<xsl:variable name="curVal" as="xs:integer" select="mx:mtx($pX, $pY)"/>
<xsl:sequence select= "if($pX eq $vDimension and $pY eq $vDimension) then $curVal else for $nextX in min(($vDimension, $pX+1)), $nextY in min(($vDimension, $pY+1)), $s1 in if($nextY gt $pY) then $pcurSum + $curVal + mx:path-minSum($pX, $nextY, $pcurSum) else 999999999999, $s2 in if($nextX gt $pX) then $pcurSum + $curVal + mx:path-minSum($nextX, $pY, $pcurSum) else 999999999999 return min(($s1,$s2)) "/> </xsl:function>
<xsl:function name="mx:mtx" as="xs:integer" saxon:memo-function="yes"> <xsl:param name="pX" as="xs:integer"/> <xsl:param name="pY" as="xs:integer"/>
<xsl:sequence select="$vMatrix[$pY]/*[$pX]"/> </xsl:function> </xsl:stylesheet>
|
Properties:
Posted at 12:23
I read Michael Lewis’ The Big Short: Inside the Doomsday Machine months ago, and have been feeling guilty about not recommending it, because this material is sort of essential for anyone who would like to understand how our economy ended up in the toilet. Read on, not just for a (spoiler: positive) review, but for potentially time- and money-saving advice.
Sidebar: Michael Lewis
I should disclose that I’m a hopeless Michael Lewis fan; in my review of Moneyball I wrote “I suspect there may not be a greater living writer of reportorial non-fiction” and yes, I still suspect that. So you could either use my admitted bias to discount this review, or alternately join the club and next time you see a Lewis book in the airport bookstore, just grab it.
Sidebar: Maybe Don’t Read the Book
The book has its roots in a November 2008 piece in Portfolio magazine, The End. I’m not actually sure that the book is a better piece of work than the (much shorter) essay; and I am pretty sure that all the really important lessons of the book are there in The End.
Now if you enjoy Lewis’ narrative of this frankly-incredible-even-though-we-all-watched-it-happen story, you probably want to get the book, just because there’s room for more narrative and the rest is mostly just as good.
But considered as a work of the non-fiction writer’s art, I’d have to favor the shorter version for its ruthless focus and pace.
It’s simple enough; Lewis found individuals and partnerships (three in the book, one in the essay) who decided the mortgage bubble was bullshit far before most others did, and made gobs of money betting it would pop, and he walks us through their experience. Most of their time doing this was pretty stressful, because a lot of apparently-smart people were betting against them every step of the way and taking home millions doing so. Even when it all fell apart and they got paid it was stressful, because they understood the scale of the disaster before anyone else.
Anyhow, they’re interesting people, the financial tools used both for inflating the bubble and betting against it are interesting too, and Lewis can tell a story with the best of them. Unless you’re oblivious to recent economic history, you’ll like it.
It’s one we should have learned already, long ago: The business of Finance is 100% about making big money for people in the business of Finance. Everything else is irrelevant. The party line is that it’s about routing capital from those who have it to those who need it in a maximally-efficient market-driven way. Hah hah hah.
There’s another legend that it’s about making money for investors. Double hah hah hah. If you’re an investor, it’s about them making bets with your money and if they lose you lose, if they win you get a few scraps and they get a bigger vacation home.
This really should not be surprising. There is apparently no social or ethical force that will cause people to bypass a chance to shovel money into their own pockets, without regard for catastrophic costs to their fellow-humans no matter how predictable. This needs to be an axiom in the thinking of all future regulatory planners.
Of all the failures that led to the big meltdown, the most aggravating is the failure of the bond-rating agencies. These people took good money for pasting AAA credit ratings on piles of the most implausible shit imaginable, and what’s irritating isn’t that they did it, it’s that apparently they didn’t break any laws and thus there’s little prospect of the long prison terms that anything smelling of natural justice would require.
If you shared my blank, astonished “how could that happen?” reaction, you’ll probably enjoy Roger Lowenstein’s Triple-A Failure, published in April 2008 in the New York Times.
Once again, not surprising: having debt ratings paid for by the people issuing debt creates a huge conflict of interest, and per Lesson 1, any such conflicts will be taken advantage of by Finance insiders to fleece the sheep otherwise known as you and me.
Finance’s relationship to the economy should best be considered by policymakers as that of a dangerous parasite to its host. Any benefits offered at the margin by its market-making functions are dwarfed by the existential threats it is empirically observed to pose, on a regular basis, to the proper functioning of the real economy.
An earlier draft had the word “real” in the previous sentence enclosed in quotes. I took them away, because it really is real, as opposed to Finance which, on the evidence, is the mostly-toxic product of pure imagination, imagination fevered by a lethal illness that the rest of us are in danger of catching.
I’d say Finance should be regulated into utility status all over the civilized world, and if the community of hard-core financial engineers really wants to go on being a collective pathogen, they’ll be forced to acknowledge that what they do just isn’t civilized behavior, and do it somewhere else. We’ll be way better off without them among us.
Posted at 22:31
Software companies love hiring people that like solving hard technical problems. On the surface this seems like a good idea, unfortunately it can lead to situations where you have people building a product where they focus more on the interesting technical challenges they can solve as opposed to whether their product is actually solving problems for their customers.
I started being reminded of this after reading an answer to a
question on Quora about
the difference between working at Google versus Facebook where
Edmond Lau David Braginsky wrote
Culture:
Google is like grad-school. People value working on hard problems, and doing them right. Things are pretty polished, the code is usually solid, and the systems are designed for scale from the very beginning. There are many experts around and review processes set up for systems designs.Facebook is more like undergrad. Something needs to be done, and people do it. Most of the time they don't read the literature on the subject, or consult experts about the "right way" to do it, they just sit down, write the code, and make things work. Sometimes the way they do it is naive, and a lot of time it may cause bugs or break as it goes into production. And when that happens, they fix their problems, replace bottlenecks with scalable components, and (in most cases) move on to the next thing.
Google tends to value technology. Things are often done because they are technically hard or impressive. On most projects, the engineers make the calls.
Facebook values products and user experience, and designers tend to have a much larger impact. Zuck spends a lot of time looking at product mocks, and is involved pretty deeply with the site's look and feel.
It should be noted that Google deserves credit for succeeding where other large software have mostly failed in putting a bunch of throwing a bunch of Ph.Ds at a problem at actually having them create products that impacts hundreds of millions people as opposed to research papers that impress hundreds of their colleagues. That said, it is easy to see the impact of complexophiles (props to Addy Santo) in recent products like Google Wave.
If you go back and read the Google Wave announcement blog post it is interesting to note the focus on combining features from disparate use cases and the diversity of all of the technical challenges involved at once including
The product announcement read more like a technology showcase than an announcement for a product that is actually meant to help people communicate, collaborate or make their lives better in any way. This is an example of a product where smart people spent a lot of time working on hard problems but at the end of the day they didn't see the adoption they would have liked because they they spent more time focusing on technical challenges than ensuring they were building the right product.
It is interesting to think about all the internal discussions and time spent implementing features like character-by-character typing without anyone bothering to ask whether that feature actually makes sense for a product that is billed as a replacement to email. I often write emails where I write a snarky comment then edit it out when I reconsider the wisdom of sending that out to a broad audience. It’s not a feature that anyone wants for people to actually see that authoring process.
Some of you may remember that there was a time when I was
literally the face of XML at Microsoft (i.e. going to http://www.microsoft.com/xml
took you to a page with my face on it
). In those days I spent a lot of time using phrases like the
XML<-> objects impedance mismatch to describe the fact that
the dominate type system for the dominant protocol for web services
at the time (aka SOAP)
actually had lots of constructs that you don’t map well to a
traditional object oriented programming language like C# or Java.
This was caused by the fact that XML had grown to serve conflicting
masters. There were people who used it as a basis for document
formats such as DocBook and
XHTML. Then there
were the people who saw it as a replacement to for the binary
protocols used in interoperable remote procedure call technologies
such as CORBA and
Java
RMI. The W3C decided to solve this problem by getting a bunch
of really smart people in a room and asking them to create some
amalgam type system that would solve both sets of completely
different requirements. The output of this activity was XML Schema which became the type
system for SOAP, WSDL and the WS-* family of technologies. This
meant that people who simply wanted a way to define how to
serialize a C# object in a way that it could be consumed by a Java
method call ended up with a type system that was also meant to be
able to describe the structural rules of the HTML in this blog
post.
Thousands of man years of effort was spent across companies like Sun Microsystems, Oracle, Microsoft, IBM and BEA to develop toolkits on top of a protocol stack that had this fundamental technical challenge baked into it. Of course, everyone had a different way of trying to address this “XML<-> object impedance mismatch which led to interoperability issues in what was meant to be a protocol stack that guaranteed interoperability. Eventually customers started telling their horror stories in actually using these technologies to interoperate such as Nelson Minar’s ETech 2005 Talk - Building a New Web Service at Google and movement around the usage of building web services using Representational State Transfer (REST) was born. In tandem, web developers realized that if your problem is moving programming language objects around, then perhaps a data format that was designed for that is the preferred choice. Today, it is hard to find any recently broadly deployed web service that doesn’t utilize on Javascript Object Notation (JSON) as opposed to SOAP.
The moral of both of these stories is that a lot of the time in software it is easy to get lost in the weeds solving hard technical problems that are due to complexity we’ve imposed on ourselves due to some well meaning design decision instead of actually solving customer problems. The trick is being able to detect when you’re in that situation and seeing if altering some of your base assumptions doesn’t lead to a lot of simplification of your problem space then frees you up to actually spend time solving real customer problems and delighting your users. More people need to ask themselves questions like do I really need to use the same type system and data format for business documents AND serialized objects from programming languages?
Now
Playing:
Travie McCoy -
Billionaire (featuring Bruno Mars) 
Posted at 02:47
I travel quite a bit, and I have found that the “tethering & portable hotspot” facility in Android 2.2 is just absolutely wonderful. It has saved me considerable money and got me reasonably-good connectivity in places I wouldn’t otherwise have had it; I’m looking at you, big-name US hotel chains.
When I heard that telephone companies were charging extra for this, I couldn’t figure out how they were doing it; without considerable deep-packet inspection, how can you tell that there are other computers gatewaying through my Nexus One, which in fact seems to hotspot just fine on certain networks that are said to charge extra? The answer is obvious but only once you see it: the network operators modify Android on the locked phones they sell cheap along with a contract (perfectly legal, it’s open-source) to remove the built-in tethering/hotspot option, and replace it with one of their own, which they charge for.
I’m not going to weigh in on the pros and cons of the business model, because I have no insight into telco cost structures or indeed what would happen if tethering became free for everyone. There’s no doubt that for some of us it’s a major value-add and it doesn’t seem unreasonable to pay a little extra for it. I paid a few bucks a month for Boingo until I got this going, and that seemed fair.
However, I will point out that for people who travel a lot, an unlocked phone (in the range of $500 for most decent Android devices) might end up looking cheap.
Further practical advice: plug that puppy in if you’re going to be doing this for more than a few minutes, because that WiFi radio seems to eat watts in hotspot mode. And don’t stick it in your pocket; the Nexus One, at least, runs way hot when plugged-in and tethering.
Posted at 22:23
Dave Megginson (who drove the development of the SAX API that
will be familiar to many XML developers who use Java) recently
wrote Java is dead. Java stood out as a programming language
(though not as a platform) in that...
Posted at 05:21
I installed the Google Chat Voice plugin today, and found that I was able to make free Google Voice calls from Canada to both a US and a Canadian POTS number. I’m still unable to register for Google Voice at google.com/voice, and I cannot use the Google Voice app on my Android phone, but at least I can initiate a call from inside GMail on my laptop now.
Does this mean that Google is about to roll out full Google Voice support for Canada, or just that they forgot to plug a hole in the code that that’s supposed to prevent non-US accounts from using the service?
Google Voice on my Android phone would be fantastic, because I could make unlimited North American phone calls on (say) a 6 GB, $30/month data plan instead of paying the world’s highest cell phone bills for (limited) voice and long distance. I’m sure Rogers and other mobile carriers won’t be happy about that, but I hope their lobbyists can’t stop it.
Tagged: business,
mobile,
news,
voip
![]()
Posted at 01:04
Like many people I know, the dichotomy between doing and blogging is often resolved by more doing, and not so much blogging, especially with Twitter, Identi.ca, et al around for the quick asides. Time to craft a careful post is in short supply, especially sufficient time to craft a post that looks effortless.
But today one of my projects has finished one major phase so I’m taking some time. I’ve started working in healthcare, or more precisely, doing project management on a project basis for Alschuler Associates, involving lots of XML, lots of client discussions, and working with a distributed team across 3 timezones. It’s interesting, and complicated, and I still feel like I’m just getting started although I’ve been working on it for almost six months.
And it’s just as well those projects are in a slower spell, since in a little over a week the XML Summer School starts, for which I’m Course Director. Most of the prep work has been done, and soon the fun and learning start. I enjoy going each year, catching up on new technologies, learning more about the ones I’ve heard about before but haven’t had a chance to try out, catching up on what’s new in the world of XML. I didn’t make it to Balisage this year due to project commitments (see above); the XML Summer School makes up for that to some extent. And this year we’re in Oxford at the right time for the St Giles Fair, which makes for a change to the usual pub crawl.
Other projects are taking a back seat, unfortunately. There’s only so much time in the day, and so many interesting things to fill it with.
Posted at 15:56
I’m heading to a new adventure at Digg in San Francisco to be a lead software engineer working on APIs and syndication.
I’ve been at Yahoo! nearly 5 years so it is both a happy and sad time for me, and I wish all the excellent people I worked with the best of luck in future.
Here is a summary of the main changes:
Exciting!
Posted at 20:44
Some of these puppies have been keeping a browser tab open since April. No theme; ranging on the geekiness scale from extreme to mostly-sociology.
First, the good news. There’s real demand for senior people in our trade. Simon Phipps, who got me the job at Sun and whose opinions I pay careful attention to even when I disagree, has a new gig at ForgeRock, where they’re trying to build a sensible profitable business around open-source principles and some damn good technology that Oracle was too stupid to get behind.
Also, my long-time compatriot Dave Orchard just started looking for a gig; we had coffee the other day and he’s fielding some super-interesting offers. He hasn’t accepted any; if you want that sort of talent, better move fast.
On the other hand, half the people out there are women, and while I have to say that their progress through the educational and business worlds gives lots of reason for cheer, we still are mostly failing at attracting them to technology careers. A few of pieces on this front crossed my radar recently: Nicole Sullivan a.k.a. “Stubbornella” on Woman in technology, Alice Adams’ What Women Want and How Not to Give It to Them, and Anil Dash’s Mechanisms of Exclusion. These are neither short nor uncontroversial, but I’ll leave my side of the controversies out and just assert that they’re really worth reading. Well, except to say, in response to Anil, that I’d advise most entrepreneurs, women and men both, to stay well away from VCs at this moment in history.
I’d previously come across Harald Welte as one of the leaders of the fascinating but fruitless OpenMoko project; his Anatomy of contemporary GSM cellphone hardware (PDF) is deep and well-written. Used to be I didn’t understand how all that “radio” layer stuff worked; I still don’t, but now I sort of know what I don’t know.
Christian Neukirchen: Programming for Android with Scala.
Probably because I spend way too much time in airplanes, I’ve always enjoyed Flight Level 390, by an anonymous commercial airline captain, on the pains and pleasures of flying Airbuses all over the New World.
In Independence Day Over Pensacola, he talks about a tricky landing in Orlando and as he’s winding up, writes “The crew van is rolling as iPhones and Droids come out of pockets and purses to call loved ones.” And that’s about how it is, you know; the mainstream is Them and Us, for now.
I used to worry about it all the time in my previous job, and I still watch that world. I see that there was an Intel Threading Challenge 2010, and I’m unsurprised that it was won by Dmitriy V’jukov, also the winner of my Wide Finder 2 challenge, with some of the gnarliest C code imaginable. Which served to demonstrate my point that this stuff is still way, way too hard.
Oh, and that Intel challenge has a Phase 2.
I’m talking about Michael Nygard’s The Future of Software Development, and contains probably the harshest prognosis for Java’s future that I’ve read from someone who’s actually speaking in a reasoned tone of voice.
Here are two short essays on the same subject: Why many people like using Ruby; I’m one of them: Michael Bleigh writes The Future’s Pretty Cool, or Why I Love Ruby and Len Smith’s 8 Reasons I love Ruby.
From The Economist, Computer says no; I’m delighted that someone is telling civilians the truth about how badly our discipline is practiced, most places, most times.
From Kellan, and oh my goodness does he put it well. Minimal Competence: Data Access, Data Ownership, and Sharecropping. Sample quote: “It’s your data, and you’ve granted us a limited license to use it... The ability to get out the data you put in is the bare minimum. All of it, at high fidelity, in a reasonable amount of time.”
Mmmmm, tasty.
Posted at 02:01
In a post
about Gridworks Jeni says:
"Like a lot of spreadsheets created by normal people, who want to
create something readable by human beings rather than computers, it
has some extra lines at the top to explain what the spreadsheet
contains..."
There is a terribly, terribly common pattern here and it has always
surprised me that spreadsheet developers have never made row 1 and
col 1 "special" for exactly this reason. I've lost count of the
number of spreadsheets I've seen that have labels in row 1, labels
in col 1 and data in the intersection cells.
Subject, predicate, object anyone:-)
Where do all the triples go?.
Posted at 19:27
This is indeed a sad day for all of us, for on October 1, a great app will be gone. Though we hardly had enough time during his short life to get to know him, like the grass that withers and fades, this monkey will finish his earthly course.
I know he left many things undone, for example only enhancing 60% of the delivered result pages. He never got a chance to finish his life’s ambition of promoting RDFa and microformats to the masses or to be the killer app of the (lower-case) semantic web. You could say he will live on as “some of this structured data processing will be supported natively by the Microsoft platform”. Part of the monkey we loved will live on as enhanced results continue to flow forth from the Yahoo/Bing alliance.
The SearchMonkey Alumni group on LinkedIn is filled with wonderful mourners. Micah Alpern wrote there
I miss the team, the songs, and the aspiration to solve a hard problem. Everything else is just code.
Isaac Asimov was reported to have said “If my doctor told me I had only six minutes to live, I wouldn’t brood. I’d type a little faster.” Today we can identify with that sentiment. Keep typing.
-m
Posted at 06:07
When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:
and, of course:
Our goal within the linked data part of data.gov.uk (and I know we haven’t achieved it yet) is to both answer these questions and to make the answers as simple as possible. The answers to the questions cannot either require up-front knowledge of all possible types of data that might be published or depend on the availability of linked data for all the things we want to talk about. It cannot require registration at centralised services. It cannot require everyone to do everything in the same way or at the same pace.
We must take adopt an approach that encourages people to make their data available in forms that are easier for other people to pick up and use because they see the benefits for them and their stakeholders and because the effort of doing so is not too high to bear. We must grow, adapt and evolve incrementally. If linked data eventually wins, it will be due to its benefits, not to faith.
Anyway, enough rant. The point of this blog post is to talk about one of the answers to the ‘How do we create it?’ question: using Freebase Gridworks. For those who haven’t encountered it, Gridworks is an incredibly useful application that enables you to easily analyse, clean and manipulate tabular data. In a few steps, it can be used to generated linked datasets which can then be published on the web just like any other file, ready for other people to reuse without jumping through hoops. I’m going to assume that you can download it and install it following the instructions provided on the Gridworks site.
In this post, I’m going to talk about how to use Gridworks to generate linked data, using an example of local government spending data from Windsor and Maidenhead council. Like a good train journey, there’s quite a lot to see along the way.
Note: Many thanks to Dave Reynolds for his work on this data and comments on an earlier version of this post.
The first step is to import the data into Gridworks. If you just take the Windsor & Maidenhead data and import it directly, you’ll get a single not-very-useful column as shown in the following screenshot:

If you look at the spreadsheet in a normal spreadsheet programme then you’ll see why. Like a lot of spreadsheets created by normal people, who want to create something readable by human beings rather than computers, it has some extra lines at the top to explain what the spreadsheet contains, as shown in the following screenshot:

Fortunately, Gridworks lets us easily skip over these first few
lines. When you import the data, put the number 1 in
the box for “Ignore X initial non-blank lines”, as shown here:

(You need the number 1 because although there are
three lines before the table really starts, the second two of those
are blank.)
That done, the data should look a lot more useful, as shown in the following screenshot:

The next thing to do is to explore the data a bit to get a handle on what’s there and work out whether any cleaning or rationalisation is necessary to improve its quality.
With columns that hold names, such as ‘Directorate’, ‘Service’
or ‘Supplier Name’, you’re looking for slight misspellings caused
by bad data entry. Gridworks helps you find these by creating a
list of the distinct values for a particular column and telling you
how many instances there are of each. Use the arrow at the side of
the column name to pull down the menu, then choose Facet >
Text Facet to create this list, as shown here:

Once you’ve chosen Text Facet, the list pops up on
the left hand side of the window. You can click on these to filter
the table to contain just those rows that have that value for that
column, but you can then scan through this to spot any places where
there looks to be a typo or two entries that should really be the
same. For example, the Services list holds both ‘Libraries &
Information Services’ and ‘Library & Information Services’, as
shown here:

It’s unlikely that there are really two distinct services with
such similar names, so we’d like to clean up this data by
standardising on one name or another. You can quickly change all
occurrences of one value to another using the edit
option that appears just to the right of the value when you hover
over it. This brings up a dialog that enables you to change all of
those values to something else, as shown here:

You can do something similar with numeric columns, such as the
‘Amount excl vat £’ column. This time choose Numeric
Facet rather than Text Facet and you’ll get a
histogram up as shown here:

This is useful for identifying outliers. If you grab the handle on the left of the histogram and move it to the centre, the rows will get filtered to only those that have an amount within that range. For example, moving it to only show rows between £500,000 and £1,500,000 shows that there are three payments of this size, all made by Children’s Services to Wilmott Dixon Construction Limited, as shown in this screenshot:

Although these values are much higher than most of the others in the spreadsheet, they don’t seem to be errors — I guess a new school was being built or something — so there’s nothing to correct here, but it shows how numeric facets can be used to explore the data.
Another approach to exploring and cleaning the data is to use
the clustering algorithms that are built into Gridworks to identify
duplicates. To do this, pull down the column menu and this time
choose Edit Cells... > Cluster and Edit, as shown
in the following screenshot, this time for the ‘Supplier Name’
column:

This brings up a dialog that groups together values that look similar. In this case, ‘Siemens plc’ and ‘Siemens PLC’, as shown in the following screenshot:

You can use this dialog to change all the similar values to a
standard one. Check the Merge checkbox for the
clusters of values that should be merged, edit the New Cell
Value field to whatever standard value you want to adopt,
and choose Apply & Re-cluster or simply
Apply & Close to make the change.
You will often find that the default clustering algorithm (key collision/fingerprint) doesn’t come up with any clusters as it’s fairly conservative. It’s worth playing around a bit with different algorithms to look for other duplicates by selecting other possibilities from the drop-down menus. For example, choosing the ‘nearest neighbour’ method with the Levenstein distance function and a radius of 2 (edits) results in four possible duplicates within the Suppliers list, as shown here:

If you’re not sure about whether the cluster is due to a typo or
not, hover over the row and click on the Browse this
cluster link that appears. That will bring up a separate
window that will show you just the rows in the cluster, from which
you should be able to make a judgement. For example, it’s not clear
whether ‘Academia Ltd’ is a typo for ‘Academics Ltd’ but browsing
the cluster shows that the Cost Centre codes and the Types of the
transactions are completely different for the two Suppliers, so
they are probably different.
The next step is to derive some data from what we have within the spreadsheet. Since our goal is to produce linked data, the kind of derived data that we’re interested in are URIs.
At this point we need to start making decisions about what URIs
to use. If you look at the list
of spending data from Windsor and Maidenhead, you’ll see that
there are a whole bunch of these spreadsheets. It would be really
useful if we could tie these spreadsheets together by using the
same URIs for the same things across the datasets. For that reason,
the only URI that’s going to be local to the dataset is the URI for
each line (or data point if you like) itself. On the other hand,
most of the things that are named here are going to be local to
Windsor & Maidenhead: ‘Abba Cars’ may be sufficient to identify
a single company within Windsor & Maidenhead, but certainly
wouldn’t be nationwide. So the URIs I’m going to create here are
mostly going to be within the www.rbwm.gov.uk
domain.
Here’s the table of the columns and the associated URIs that I’m going to use. I should stress that this is just for example purposes, but I’ve used the following principles:
/id at the
start of the path, and URIs for conceptual things should have
/def at the start of their paths; both should result
in a 303 redirection to a suitable web pageThis is what we’re doing within data.gov.uk, but it’s an important principle of the web that different councils might well choose their own URI schemes, depending on the kind of technology support that they have, without any bad side-effects on the interpretation of the data.
| Column | URI pattern |
|---|---|
| (Dataset) | http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2 |
| (Row/ExpenditureLine) | http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#{row-number} |
| (Council) | http://statistics.data.gov.uk/id/local-authority/00ME |
| Directorate | http://www.rbwm.gov.uk/id/directorate/{directorate-slug} |
| Updated | http://reference.data.gov.uk/id/day/{date} |
| TransNo/Payment | http://www.rbwm.gov.uk/id/transaction/{transaction-number} |
| Service | http://www.rbwm.gov.uk/id/service/{service-slug} |
| Cost Centre | http://www.rbwm.gov.uk/def/cost-centre/{cost-centre-code} |
| Supplier Name | http://www.rbwm.gov.uk/id/supplier/{supplier-slug} |
As you can see, those of the columns that contain text fields have, as part of their URI, a ‘slug’. This is a shortened, normalised value suitable for putting in a URI: basically ensuring that the string doesn’t contain any punctuation or spaces. For example, ‘Adult & Community Services’ would turn into ‘adult-community-services’.
Our first task will be to create these slugs. To do this, we’ll
create a new column based on the existing ones by choosing
Edit Column > Add Column Based on This Column ...
from the drop-down menu on the appropriate column:

Selecting this will bring up a dialog which will ask you to name the new column and then enter a formula to calculate the new value, as shown here:

The default language for this formula is Gridworks’ own, though there are other options available. To create the slug, we need to:
This is done in two steps. The first three steps can be done using the formula:
replace(replace(toLowercase(value), ' ', '-'), /[^-a-z0-9]/, '')
Gridworks helps by listing the original and resulting values for
the first several rows of the spreadsheet, so that you can see
whether it’s working as expected. When you’re happy, hitting
OK creates the new column.
The last step (replacing all sequences of two hyphens with a
single hyphen) can be done by editing the cells in the new column.
Bring up the Edit Cells... > Transform... dialog
using the menu:

and use the formula:
replace(value, '--', '-')
then check the Re-transform until no change
checkbox so that any pairs of hyphens are repeatedly replaced with
single hyphens, as shown here:

The other tabs in the new column and edit cells dialogs are
really helpful. The History tab lets you choose
formulae that you’ve used before to use again. This is useful here
because we want to create the slugs for the Service and Supplier
Name in the same way. The Help tab lists all the
functions that you can use within the formula.
Creating the URIs for the columns proceeds in the same way, except this time the formulae are more like:
'http://www.rbwm.gov.uk/id/directorate/' + value
There are two that are slightly different. First, there’s the URI for the date, which needs to be constructed from the date/time value held by Gridworks as follows. We can do this in two stages. First, to construct a new column called ‘Date’ to hold the formatted date:
datePart(value, 'year') + '-' +
if (datePart(value, 'month') < 9, '0', '') + replace(datePart(value, 'month') + 1, '.0', '') + '-' +
if (datePart(value, 'day') < 10, '0', '') + datePart(value, 'day')
(note that the datePart() function returns a
0-based count for the month) and then to create the Date URI column
based on this:
'http://reference.data.gov.uk/id/day/' + value
Second, there’s the URI for the row (an expenditure line) itself, which needs to be constructed using the row number. It’s useful to construct it as a local URI (ie just the fragment) as this means the same code can be used to construct the column across different datasets, so it’s just:
'#' + rowIndex
Once the extra columns have been made, it’s time to export data
from Gridworks. While Gridworks makes it easy to export to CSV or
into Freebase, it’s also possible to export in any format you want
using templates. Use the Project menu and choose
Export Filtered Rows > Templating ..., as shown in
the following screenshot:

Note that this will only export the rows that you currently have selected, so if you want to export everything, make sure that you deselect any facets that you’ve currently got selected.
Choosing the Templating ... option will open up a
dialog that you can use to create whatever format you want. The
default, as shown in the following screenshot, is JSON.

On the left are four fields:
One thing to be extremely careful of here is that any changes you made to the fields on the left here will not be saved when the dialog is closed. For that reason, it’s a good idea to create your templates in a separate text file and copy and paste them in. Also note that the sample data on the right is only for the first set of rows, not for the whole spreadsheet.
We’re going to generate Turtle using the template, so the next stage is to work out precisely what Turtle to generate. We’ve been working on small vocabulary for payment data based on the Data Cube vocabulary and that’s what I’ll use here, although it isn’t quite complete and available yet as it will be. We’ll start at the bottom, with the individual rows, and then add extra surrounding information as we go.
Within this data, each row corresponds to a
payment:ExpenditureLine within the dataset. The
expenditure lines can be organised into groups based on the
payment:Payment that they’re associated with, which is
indicated through the ‘TransNo’ column in the database. Within the
payment vocabulary we’re using, we can assign individual
expenditure lines to the payment using the
payment:expenditureLine property.
The payment:payer of each
payment:Payment is Windsor & Maidenhead council.
The payment:payee is the ‘Supplier’ listed in the
spreadsheet. The payment:date is the ‘Updated’
date.
Each individual line in the spreadsheet is a
payment:ExpenditureLine which is associated with one
of these payments. The payment:expenditureCode is the
‘Cost Centre’ and the actual
payment:amountExcludingVAT is the ‘Amount excl vat £’
value. Some example Turtle for the first line is thus:
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2>
qb:slice <http://www.rbwm.gov.uk/id/transaction/2650750> .
<http://www.rbwm.gov.uk/id/transaction/2650750>
a payment:Payment , qb:Slice ;
rdfs:label "Transaction 2650750"@en ;
qb:sliceStructure payment:payment-slice ;
payment:transactionReference "2650750" ;
payment:payer <http://statistics.data.gov.uk/id/local-authority/00ME> ;
payment:payee <http://www.rbwm.gov.uk/id/supplier/1st-choice-d-b-driveways-limited> ;
payment:date <http://reference.data.gov.uk/id/day/2010-04-09> ;
payment:expenditureLine <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0> .
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0>
a payment:ExpenditureLine , qb:Observation ;
rdfs:label "Expenditure Line 0"@en ;
qb:dataSet <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> ;
payment:expenditureCode <http://www.rbwm.gov.uk/def/cost-centre/LM05> ;
payment:amountExcludingVAT 1875.00 .
That’s the basic data for each line, but there’s also some other information which should be brought out for each line:
In each of these cases, pulling the information out from each line is going to lead to a lot of repetition, because the same payee, date and so on will be described in multiple lines, but we don’t have any choice and we can tidy it up by removing duplicates afterwards. The Turtle for the first line will look like:
<http://www.rbwm.gov.uk/id/supplier/1st-choice-d-b-driveways-limited>
a org:Organization ;
rdfs:label "1st Choice - D B Driveways Limited"@en .
<http://reference.data.gov.uk/id/day/2010-04-09>
a interval:CalendarDay ;
rdfs:label "2010-04-09" ;
time:hasBeginning <http://reference.data.gov.uk/id/gregorian-instant/2010-04-09T00:00:00> ;
interval:ordinalYear 2010 ;
interval:ordinalMonthOfYear 4 ;
interval:ordinalDayOfMonth 9 .
<http://reference.data.gov.uk/id/gregorian-instant/2010-04-09T00:00:00>
a time:Instant ;
time:inXSDDateTime "2010-04-09T00:00:00"^^xsd:dateTime .
<http://www.rbwm.gov.uk/def/cost-centre/LM05>
a rbwm:CostCentre , skos:Concept ;
rdfs:label "Cost Centre LM05"@en ;
rbwm:costCentreCode "LM05"^^rbwm:CostCentreCode ;
rbwm:service <http://www.rbwm.gov.uk/id/service/magnet-leisure-centre> .
<http://www.rbwm.gov.uk/id/service/magnet-leisure-centre>
a rbwm:Service ;
rdfs:label "Magnet Leisure Centre"@en ;
rbwm:providedBy <http://www.rbwm.gov.uk/id/directorate/adult-community-services> .
<http://www.rbwm.gov.uk/id/directorate/adult-community-services>
a rbwm:Directorate ;
rdfs:label "Adult & Community Services"@en ;
org:unitOf <http://statistics.data.gov.uk/id/local-authority/00ME> ;
rbwm:provides <http://www.rbwm.gov.uk/id/service/magnet-leisure-centre> .
<http://statistics.data.gov.uk/id/local-authority/00ME>
org:hasUnit <http://www.rbwm.gov.uk/id/directorate/adult-community-services> .
You’ll see that in the last part of this I’ve introduced some
properties and classes with a rbwm: prefix. These are
for classes and properties that are here in this data, but aren’t
part of the payment vocabulary. The basic schema is:
rbwm:CostCentre a rdfs:Class ;
rdfs:label "Cost Centre"@en ;
rdfs:comment "A cost centre."@en .
rbwm:Service a rdfs:Class ;
rdfs:label "Service"@en ;
rdfs:comment "A service provided by the council."@en .
rbwm:Directorate a rdfs:Class ;
rdfs:label "Directorate"@en ;
rdfs:comment "A directorate within the council"@en .
rbwm:service a rdf:Property , owl:ObjectProperty ;
rdfs:label "Service"@en ;
rdfs:comment "The service associated with a particular cost centre."@en ;
rdfs:domain rbwm:CostCentre ;
rdfs:range rbwm:Service .
rbwm:providedBy a rdf:Property , owl:ObjectProperty ;
rdfs:label "Provided By"@en ;
rdfs:comment "The directorate that provides this service."@en ;
rdfs:domain rbwm:Service ;
rdfs:range rbwm:Directorate .
rbwm:provides a rdf:Property , owl:ObjectProperty ;
rdfs:label "Provides"@en ;
rdfs:comment "A service provided by this directorate."@en ;
rdfs:domain rbwm:Directorate ;
rdfs:range rbwm:Service .
rbwm:costCentreCode a rdf:Property , owl:DatatypeProperty ;
rdfs:label "Cost Centre Code"@en ;
rdfs:comment "The code of this cost centre."@en ;
rdfs:domain rbwm:CostCentre ;
rdfs:range rbwm:CostCentreCode .
rbwm:CostCentreCode a rdfs:Datatype ;
rdfs:label "Cost Centre Code"@en ;
rdfs:comment "A cost centre code consisting of two capital letters followed by two digits."@en .
This illustrates how individual councils might extend the
information that they make available in RDF without having to seek
any kind of prior agreement from anyone else. If, later on, a third
party starts to make available ontologies for cost centres,
services and directorates, Windsor & Maidenhead could start to
link up their RDF with those more widely standardised classes and
properties, with appropriate use of rdfs:subClassOf or
rdfs:subPropertyOf.
Now we have an idea about what data we can extract for a single
row, we can turn this into a Gridworks template. The templates are
fairly straight forward. Wherever you want to insert a value from a
particular column, you use the syntax ${Column Name}.
If you want to do any further processing, you can use the syntax
{{Formula}} to insert the result of a calculation.
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2>
qb:slice <${Transaction URI}> .
<${Transaction URI}>
a payment:Payment , qb:Slice ;
rdfs:label "Transaction ${TransNo}"@en ;
qb:sliceStructure payment:payment-slice ;
payment:transactionReference "${TransNo}" ;
payment:payer <http://statistics.data.gov.uk/id/local-authority/00ME> ;
payment:payee <${Supplier URI}> ;
payment:date <${Date URI}> ;
payment:expenditureLine <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2${Line URI}> .
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2${Line URI}>
a payment:ExpenditureLine , qb:Observation ;
rdfs:label "Expenditure Line {{rowIndex}}"@en ;
qb:dataSet <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> ;
payment:expenditureCode <${Cost Centre URI}> ;
payment:amountExcludingVAT {{cells['Amount excl vat £'].value + 0}} .
Note that the last line here uses the expression
cells['Amount excl vat £'].value + 0 in order to
ensure that every figure has a decimal place, which makes them into
xsd:decimal values within the resulting RDF.
I won’t do the rest of the row template here, though it’s available in full in a separate file.
The other parts of the template are easier to complete. The prefix needs to contain any namespace prefixes that are used within the RDF. It’s also useful to put a base URI here and describe the dataset itself. The RDF for the dataset should contain a number of properties about the dataset as a whole. There are a number of levels at which the dataset can be described:
The Turtle for this description is shown here:
<http://www.rbwm.gov.uk/public/finance_supplier_payments>
a void:Dataset ;
void:subset <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> .
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2>
a payment:PaymentDataset , void:Dataset ;
# basic metadata
rdfs:label "Windsor & Maidenhead Supplier Payments where charge to specific cost centre is >= £500 for period April 2010 - June 2010"@en ;
dct:license <http://data.gov.uk/id/licence> ;
dct:temporal [
# this time is retrieved from the Last-Modified date on the original spreadsheet
time:hasBeginning <http://reference.data.gov.uk/id/gregorian-instant/2010-08-02T08:37:02>
] ;
# statistical metadata
qb:structure payment:payments-with-expenditure-structure ;
qb:sliceKey payment:payment-slice ;
payment:currency <http://dbpedia.org/resource/Pound_sterling> ;
# linked data metadata
void:exampleResource
<http://www.rbwm.gov.uk/id/transaction/2650750> ,
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0> ;
void:vocabulary payment: , qb: , rbwm: ;
void:subset [
a void:Linkset ;
void:linkPredicate qb:slice ;
void:subjectsTarget <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/transaction> ;
] , [
a void:Linkset ;
void:linkPredicate payment:payer ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/transaction> ;
void:objectsTarget <http://statistics.data.gov.uk/id/local-authority> ;
] , [
a void:Linkset ;
void:linkPredicate payment:payee ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/transaction> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/supplier> ;
] , [
a void:Linkset ;
void:linkPredicate payment:date ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/transaction> ;
void:objectsTarget <http://reference.data.gov.uk/id/day> ;
] , [
a void:Linkset ;
void:linkPredicate payment:expenditureLine ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/transaction> ;
void:objectsTarget <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> ;
] , [
a void:Linkset ;
void:linkPredicate payment:expenditureCode ;
void:subjectsTarget <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2> ;
void:objectsTarget <http://www.rbwm.gov.uk/def/cost-centre> ;
] , [
a void:Linkset ;
void:linkPredicate rbwm:service ;
void:subjectsTarget <http://www.rbwm.gov.uk/def/cost-centre> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/service> ;
] , [
a void:Linkset ;
void:linkPredicate rbwm:providedBy ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/service> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/directorate> ;
] , [
a void:Linkset ;
void:linkPredicate rbwm:provides ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/directorate> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/service> ;
] , [
a void:Linkset ;
void:linkPredicate org:hasUnit ;
void:subjectsTarget <http://statistics.data.gov.uk/id/local-authority> ;
void:objectsTarget <http://www.rbwm.gov.uk/id/directorate> ;
] , [
a void:Linkset ;
void:linkPredicate org:unitOf ;
void:subjectsTarget <http://www.rbwm.gov.uk/id/directorate> ;
void:objectsTarget <http://statistics.data.gov.uk/id/local-authority> ;
] .
I’ve described here, verbally, exactly what I’ve done in terms of the cleaning of the data, deriving new columns, and the template that I’ve used to create a Turtle rendition of the data in this spreadsheet. One of the things that we’ve worked hard on within data.gov.uk is finding ways of expressing this provenance information in RDF. There are two reasons for this:
The basic provenance vocabulary that we’re using within data.gov.uk is the Open Provenance Model Vocabulary. This vocabulary talks about Artifacts, Processes that create and use them, and Agents that control those processes. We’ve created an extension of this vocabulary specifically to help describe this kind of scenario, where a spreadsheet is processed using Gridworks and then exported using a template. I’ll put this provenance information in a separate file simply because embedding provenance information, which includes a template, in the template itself gets us into nasty recursion issues.
As well as the template, there are two supplementary artifacts that we need to record the provenance of this data:
The first can be exported using the Project menu.
The second is accessed through the Undo/Redo tab as
shown in the following screenshot:

This tab shows the actions that have been carried out on the
data, and enables you to undo them in sequence. The
extract link at the bottom opens up the dialog shown
in the following screenshot:

You have to manually copy and paste the JSON description from the right of this dialog into a separate file in order to save it.
We can then start describing the provenance of the RDF; this needs to go in the Turtle file itself. We start by saying that the RDF that we’ve created was created from the Gridworks project and through an extraction operation. A simple link to the spreadsheet that was used as the source of the data also provides a quick link back to the original data:
<http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2>
a opmv:Artifact ;
dct:source <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls> ;
gridworks:wasExportedBy <finance_supplier_payments_2010_q2_provenance#gridworks-export> ;
gridworks:wasExportedFrom <finance_supplier_payments_2010_q2_project.tar.gz> .
The provenance information then needs to describe the export process:
<#gridworks-export>
a gridworks:ExportUsingTemplate , opmv:Process ;
rdfs:label "Process for Exporting Windsor & Maidenhead data as Turtle" ;
gridworks:project <finance_supplier_payments_2010_q2_project.tar.gz> ;
gridworks:template <#gridworks-template> .
The project itself was created from the original Excel spreadsheet. The details of how it was generated are through an import that ignored a single non-blank header row and then went through the set of operations described by the JSON.
<finance_supplier_payments_2010_q2_project.tar.gz>
a gridworks:Project , opmv:Artifact ;
rdfs:label "Windsor & Maidenhead Supplier Payments April 2010 - June 2010 Gridworks Project"@en ;
gridworks:wasCreatedFrom <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls> ;
opmv:wasGeneratedBy <#gridworks-processing> .
<#gridworks-processing>
a gridworks:Process , opmv:Process ;
rdfs:label "Processing on the Gridworks Project"@en ;
common:usedData <http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls> ;
gridworks:ignore 1 ;
gridworks:operationDescription <finance_supplier_payments_2010_q2_operations.json> .
<finance_supplier_payments_2010_q2_operations.json>
a gridworks:OperationDescription , opmv:Artifact ;
rdfs:label "Dump of the Processing carried out by Gridworks on Windsor & Maidenhead Supplier Payments April 2010 - June 2010 data"@en ;
gridworks:wasExportedFrom <finance_supplier_payments_2010_q2_project.tar.gz> ;
gridworks:wasExportedBy <#gridworks-operation-description-extraction> .
<#gridworks-operation-description-extraction>
a gridworks:ExtractOperationDescription , opmv:Process ;
rdfs:label "Extraction of the operation description from the Windsor & Maidenhead Supplier Payments April 2010 - June 2010 Project from Gridworks"@en ;
gridworks:project <finance_supplier_payments_2010_q2_project.tar.gz> .
The template is described in terms of the separate parts; in fact it’s useful to use this provenance file as the record of the template that you use, given that Gridworks won’t save the template in the project itself.
<#gridworks-template>
a gridworks:Template , opmv:Artifact ;
gridworks:prefix """
...
"""^^xsd:string ;
gridworks:rowTemplate """
...
"""^^^xsd:string .
Gridworks makes it easy to repeat a given set of operations on another spreadsheet that follows the same structure. If you download the Windsor and Maidenhead spending data from 2009 Q4 and import it into Gridworks, you’ll see that it uses the same set of columns as the 2010 Q2 data that we’ve been looking at. (Strangely enough, the 2010 Q1 data doesn’t quite follow the same structure as it doesn’t include the ‘TransNo’ column.)
There are a couple of differences:
Edit Cells... > Transform to
change these values into dates using the toDate(value)
formulatoNumber(replace(value, ',', '')) to
rectify thisYou might want to do some more cleaning, for example to check
for duplicates, but once that is done, you use the
apply link at the bottom of the Undo/Redo
tab to apply the JSON operation description that you imported for
the previous spreadsheet on this one. The templates require only a
little tweaking to give different filenames and labels, but
otherwise can be used as-is.
So while the process of cleaning data, deriving values and creating a template for exporting as Turtle is a bit of effort, the likelihood is that you will be able to repeat the same operations on similar data with a minimal amount of work.
Gridworks is a simply amazing tool for data cleansing, analysis and, as we’ve seen, transformation. It’s set to become more so for our purposes in the near future, as it comes to support the mapping of names for things to URIs using configurable reconciliation services (which might allow it to automatically map Government Department names to URIs, for example), and the creation of RDF using a more intuitive and user-friendly approach than the templates that I’ve illustrated here.
Of course there are issues, particularly for UK civil servants who typically have to operate on locked-down machines running IE7 (if they’re lucky). Gridworks also only deals with the fairly simple cases of data that fits in a spreadsheet-like structure, without the complexities of annotations on rows, columns or individual cells that we often see in government data.
Nevertheless, there’s huge potential here to provide a fairly easy route to the publication of linked data for people who are familiar with spreadsheets, in particular one that can be tweaked and extended to allow for the variety and complexity of real-world data.
Posted at 22:23
I just released a new version of my Rasqal RDF Query Library for two main new features:
The main change is to start to add to Rasqal’s APIs and query engine changes for the new SPARQL 1.1 working drafts. This release adds support the syntax for all the changes for Query and Update. The new draft syntax is available via the ‘laqrs’ query language name, until the SPARQL 1.1 syntax is finalized. The ‘sparql’ query language provides SPARQL 1.0 support.
On Query 1.1, the addition is primarily syntax and API support
for the new syntax. There is expression execution for the new
functions IF(), URI(),
STRLANG(), STRDT(), BNODE(),
IN() and NOT IN() which are noew usable
as part of the normal expression grammar. The existing aggregate
function support was extended to add the new SAMPLE()
and GROUP_CONCAT() but remains syntax-only. Finally
the new GROUP BY with HAVING conditions
were added to the syntax and had consequent API updates but no
query engine execution of them.
For Update 1.1 the full set of update operations syntax were
added and they create API structures. Note, however there seem to
be some ambiguities in the draft syntax especially around multiple
optional tokens in a row near WITH which are
particularly hard to implement in flex and bison (aka “lex and
yacc”).
The main non-SPARQL 1.1 related change is to allow building
Rasqal with Raptor V2 APIs rather than V1. Raptor V2 is in beta so
this is not a final API and is thus not the default build, it has
to be enabled with --enable-raptor2 with configure.
When raptor V2 is stable (2.0.0), Rasqal will require it.
The changes to Rasqal in this release, in summary, are:
HAVING
expressions.--with-raptor2.See the Rasqal 0.9.20 Release Notes for the full details of the changes.
Get it at http://download.librdf.org/source/rasqal-0.9.20.tar.gz.
PS The source code control has also moved to GIT and hosted at GitHub.
Posted at 21:33
Brass ensemble remake of Spottieottie..etc.. on one of earlier classic Outkast albums. 'Nuff said.
Posted at 20:09
Shed a tear of delight; don't you worry about a fall tonight
Birds flying free; What about you and me
Ooh!Take some time to let your feelings flow free
You can't hide away from what you'll be
Search the sky for new horizons to unfold
Set yourself on the oceans of dreams to behold
—from "Take Some Time" by Ndugu & The Chocolate Jam Company
❧
I remember hearing this slow jam a couple of times at dances in Nigeria in the early 80s. When Erykah Badu flipped it for "Ummm Hmmm" off her latest masterpiece New Amerykah Part Two (Return of the Ankh), she put a weeks-long itch in my skull, and I bet a lot of others who had grown up on a soul diet. I finally twigged it last week, and went to hunt down the Ndugu & The Chocolate Jam Company original, but it seems to have faded into the mists of the past a bit, which is a true shame. I did find the following audio version on YouTube, though.
Here is Badu's "Ummm Hmmm," accompanied by some lovely stills of Fat Belly Bella herself.Of course Badu wasn't the first to discover the great sample possibilities of the Leon Chancler (AKA "Ndugu") jam. DJ Premier used it back in '07 for the NYG'z project song "Welcome To G-Dom."Of course, I love me some Primo, but Erykah pwned this bitch. It's over. I hope no other DJs think they should dare follow her.Then again I'm thinking of using the Primo loop to back a poem recital one day. And maybe I have just the poem. Having learned about the terzanelle form from Heather Fowler a few weeks ago, I fell in love with the form, and I've been writing a sequence of terzanelles, one for each song on New Amerykah Part Two. I'm on "Ummm Hmmm" and the first few stanzas of my poem are as follows:
Take some time to let your feelings run free
Heart's desire—thump! thump! I've been here before—
You can't hide away from what you'll be.You can't hide; don't cheat I've been keeping score.
Place your bet, love; scared money don't make none.
Heart's desire—thump! thump! I've been here before.Truth and Icarus dare, the money sun,
Angel bird, let's jump off into your world.
Place your bet, love; scared money don't make none.
Naturally it includes elements from Ndugu's song, as well as
Badu's. I can't find the lyrics to "Take Some Time" anywhere
on the Net so I, ah, took some time to transcribe them
myself. As you can see from the square brackets and the
ellipses, there are some parts I can't figure out right now, but I
think I got most of it.
"Take Some Time" by Ndugu & The Chocolate Jam Company
Do you always conceal what you feel inside
Man does not ever drift with the flow of the tide
Makes it hard to see when [it attracts you and me]
And there comes a time when your feelings should run freeAnd understand that you're over me when you're ...
It'll lend you a helping hand when your [crimes] cross the tide
Takes you high in the sky of your heart's desire
Float through the valley of love; you'll start to fly
Like a bird in the sky who's just learned to fly
Makes you feel so proud you might want to cryShed a tear of delight; don't you worry about a fall tonight
Birds flying free; What about you and me
Ooh!Take some time to let your feelings flow free
You can't hide away from what you'll be
Search the sky for new horizons to unfold
Set yourself on the oceans of dreams to beholdWell you're pride by your side when we're looking on
Keep your head to the sky through the weather of the stormTake the compliment as if it came heaven-sent
From someone up above [with music] with loveWhat's the nature of your mind when the trouble starts to [grind]
Do you leave yourself behind, not to be caught up on the line
Signs of life is a lot to see that you hold in your [belief]Free your time; what about your mind
Wow!Take some time to let your feelings flow free
You can't hide away from what you'll be
Search the sky for new horizons to unfold
Set yourself on the oceans of dreams to beholdYou will find further on down the line
Is what you've got to do, to see you throughTake some time to let your feelings flow free
You can't hide away from what you'll be
Search the sky for new horizons to unfold
Set yourself on the oceans of dreams to beholdTake some time to let your feelings flow free
You can't hide away from what you'll be
Search the sky for new horizons to unfold
Set yourself on the oceans of dreams to behold
Posted at 20:09
Posted at 20:09
I implemented dimensions.py perhaps eight years ago as an exercise and have used it occasionally ever since.
It allows doing math with dimensioned values in order to automate unit conversions (you can add m/s to mile/hour) and dimensional checking (you can't add m/s to mile/lightyear). It specifically does not convert 212F to 100C but rather will convert 9F to 5C (valid when converting temperature differences).
It is similar to unums (http://home.scarlet.be/be052320/Unum.html) but with a significant difference:
I used a different syntax Q(25,'m/s') as opposed to 100*m/s (I recall not wanting to have all the base SI units directly in the namespace). I'm not entirely sure which approach is really better.
I also had a specific need to have fractional exponents on units, allowing the following:
>>> km=Q(10,'N*m/W^(1/2)') >>> km Q(10.0, 'kg**0.5*m/s**0.5')Looking back I see a few design decisions I might do differently today, but I'll share it anyway.
Some examples are in the source below the line with if __name__ == "__main__":
Note that I've put two files into the code block below, dimensions.py and dimensions.data, so please cut them apart if you want to try it.
Very impressive library. I recently incorporated the use of the Measurement Unit Ontology into the Computer-based Patient Record (CPR) ontology and (on the surface) it seems like a library like this can provide the unit conversion machinery for RDF instances that use such a framework.
Posted at 20:09
Spotted this new Talib Kweli song, called the Ballad of the
Black Gold in a hypem
link (you can watch the video there). Very timely given
the recent BP mess. Much respect to Talib for going into some of
the history of Oil politics in Nigeria; an excerpt from Verse 2 is
below:

Nigeria is celebrating 50 years of independence
They still feel the colonial effects of Great Britain's presence
Dictators quick to imitate the West
Got in bed with oil companies and now the place is a mess
Take a guess, which ones came and violated
They oiled up the soil, the Ogoni people was almost annihilated
But still they never stayed silent
They was activists and poets using non-violent tactics
That was catalyst for soldiers to break into they crib
Take it from the kids and try to break'em like a twig
And make examples of the leaders; executed Saro-Wiwa,
Threw Fela's mom out the window right after they beat her
In an effort to defeat hope. Now the people's feet soaked in oil [?]
So the youth is doing drive-bys through speed boats [?]
They kidnap the workers, they blowing up the pipelines
You see the fires glowing in the nighttime
Posted at 20:09

Well, I pretty much knew it was going to happen as soon as they were bounced out of the playoffs. This poster downtown is going to look real stupid tomorrow. At least he didn't do it for the money. Cleveland folks should show some respect to how much he elevated our game. We hadn't been contenders since the days of Larry Nance; remember those Cavs?
Posted at 20:09