Saturday, November 2, 2013

Recovering from Binge Hacking



TheHackerCIO has been on a Hacking binge. He has no problem with eating binges. Nor with drinking binges. But coding is another matter!  A client had to submit to the Google Play store, so he was busy getting the code ready to ship. Late into the night.

It's a shame, but hacking trumps blogging!

Any day.

Just before the Halloween scare -- that 1,000+ line method -- I was able to do my evening Hacking (partially) at a Meetup about Scala.

The Meetup featured a dual perspective: it was about Scala, but also about Hulu's experience with running Hadoop in conjunction with Scala.

So Hulu has a business somewhat similar to Netflix, in streaming video. They have an interest, therefore in a Big Data analysis of their viewers habits. They implement this by -- as the song says --

"Every move you make ...
Every step you take ....
I'll be watching you"

Actually, the "watching" is done by the clients production of "beacons", which are directed back to the server, collected and run through the Hadoop cluster for Big Data analysis. We're talking 4.5 Million subscribers; 25 Million uniques/month; 1 Billion beacons/day.

The Hadoop cluster is basically a Data Warehouse on Steroids! It's a 24 x 7 pipeline of analysis of the beacons on an hourly basis. The Scala portion is used to run the Scheduler that feeds into the Hadoop cluster, and is used to ensure that basic dependencies are met prior to job initiation.

The Hadoop cluster has it's own scheduler that works with load balancing and more detailed level scheduling concerns.

A few points of interest about Scala -- which I know very little about. Apparently Scala Slick, a mechanism for database access within Scala's functional approach, has a hard limitation of 1-22 columns! It appears that there is a new kind of impedance mismatch going on here! Instead of the impedance mismatch between the Relational model and the Object Oriented model, Scala has an impedence mismatch between the Relational model and the Functional Programming Model!

The really funny thing is that over two decades ago there were attempts to fashion languages without an impedance mismatch, but they never got the market traction necessary to triumph. On the one hand, there were Object databases; but on the other hand there were 4GLs, such as Progress, which were built around the notion of Set-oriented operations. The object databases stored C++ objects or Java objects directly in the object store, without the need for an ORM such as Hibernate to do the translation. The Progress 4GL had commands that would operate over entire Results sets. I really think that such a Set-Oriented library would be an excellent addition to the Java ecosystem. Might be a good open source project.

The presenter had a workaround for the max. 22 column restriction. Called slicking, it's available on github here.

Another interesting discussion was on the use of Parser combinators to create external-facing DSLs.

And even more interesting to TheHackerCIO, was a discussion about Scala Macros coming out in 2.11. Apparently, these in conjunction with IntelliJ Idea, allow the IDE to lookup the actual database columns you need and show them in auto-completion.  The means, for those of you who do less coding, that you can see as you type what the possible tables are, select one for your code, and then see the possible columns you might need, and select the one you want. Then the IDE generates all the glue code you need to actually execute this code. So, in essence, the macro manages the impedance we spoke of earlier! Now that is way cool. Especially as I just finished clicking open a new window and querying the database to see the schema in order to enter my code correctly. The amount of time wasted by such continual look-ups adds up over time.

Looking Forward to Doing More with Scala,

TheHackerCIO

No comments:

Post a Comment