Saturday, November 2, 2013

Recovering from Binge Hacking



TheHackerCIO has been on a Hacking binge. He has no problem with eating binges. Nor with drinking binges. But coding is another matter!  A client had to submit to the Google Play store, so he was busy getting the code ready to ship. Late into the night.

It's a shame, but hacking trumps blogging!

Any day.

Just before the Halloween scare -- that 1,000+ line method -- I was able to do my evening Hacking (partially) at a Meetup about Scala.

The Meetup featured a dual perspective: it was about Scala, but also about Hulu's experience with running Hadoop in conjunction with Scala.

So Hulu has a business somewhat similar to Netflix, in streaming video. They have an interest, therefore in a Big Data analysis of their viewers habits. They implement this by -- as the song says --

"Every move you make ...
Every step you take ....
I'll be watching you"

Actually, the "watching" is done by the clients production of "beacons", which are directed back to the server, collected and run through the Hadoop cluster for Big Data analysis. We're talking 4.5 Million subscribers; 25 Million uniques/month; 1 Billion beacons/day.

The Hadoop cluster is basically a Data Warehouse on Steroids! It's a 24 x 7 pipeline of analysis of the beacons on an hourly basis. The Scala portion is used to run the Scheduler that feeds into the Hadoop cluster, and is used to ensure that basic dependencies are met prior to job initiation.

The Hadoop cluster has it's own scheduler that works with load balancing and more detailed level scheduling concerns.

A few points of interest about Scala -- which I know very little about. Apparently Scala Slick, a mechanism for database access within Scala's functional approach, has a hard limitation of 1-22 columns! It appears that there is a new kind of impedance mismatch going on here! Instead of the impedance mismatch between the Relational model and the Object Oriented model, Scala has an impedence mismatch between the Relational model and the Functional Programming Model!

The really funny thing is that over two decades ago there were attempts to fashion languages without an impedance mismatch, but they never got the market traction necessary to triumph. On the one hand, there were Object databases; but on the other hand there were 4GLs, such as Progress, which were built around the notion of Set-oriented operations. The object databases stored C++ objects or Java objects directly in the object store, without the need for an ORM such as Hibernate to do the translation. The Progress 4GL had commands that would operate over entire Results sets. I really think that such a Set-Oriented library would be an excellent addition to the Java ecosystem. Might be a good open source project.

The presenter had a workaround for the max. 22 column restriction. Called slicking, it's available on github here.

Another interesting discussion was on the use of Parser combinators to create external-facing DSLs.

And even more interesting to TheHackerCIO, was a discussion about Scala Macros coming out in 2.11. Apparently, these in conjunction with IntelliJ Idea, allow the IDE to lookup the actual database columns you need and show them in auto-completion.  The means, for those of you who do less coding, that you can see as you type what the possible tables are, select one for your code, and then see the possible columns you might need, and select the one you want. Then the IDE generates all the glue code you need to actually execute this code. So, in essence, the macro manages the impedance we spoke of earlier! Now that is way cool. Especially as I just finished clicking open a new window and querying the database to see the schema in order to enter my code correctly. The amount of time wasted by such continual look-ups adds up over time.

Looking Forward to Doing More with Scala,

TheHackerCIO

Thursday, October 31, 2013

Scary Halloween Tale


We looked at some code early this morning and one method was over 1,000 lines. That's not counting the helper methods it called. Now that's what I call scary...

I Remain,

TheHackerCIO

Wednesday, October 30, 2013

Quick Point on Cucumber

Quick point on Cucumber

Last night we covered the next two chapters of The Cucumber book. These were on Gherkin -- the syntax for specifying Scenarios. It's kind of amazing that the keywords are essentially meaningless! You can actually use an "*" in place of any keywords, with exactly the same results.

We also determined one reason why "the enemy" disliked it. There is no namespace capability within Cucumber. Consequently, it becomes cumbersome proportionally with the increase in Scenarios and Features. At least it throws an error if an ambiguous step definition is created. But as the number of scenarios increases, the time necessary to run also increases.  Which is to say, it doesn't scale well.

Next week -- on Tuesday,  will be the LAJUG main meeting & I hope to see everyone there!

Also next week -- on Monday, will be the Technology Radar Group & I hope to see everyone there as well. :-)

I Remain,

TheHackerCIO

Tuesday, October 29, 2013

New Wine or Technology In Old Bottles!

"Neither do men put new wine into old bottles: else the bottles break, and the wine runneth out, and the bottles perish: but they put new wine into new bottles, and both are preserved." [ref]

New technology used in old ways also breaks them and leads to pathological systems. The best summary and critique of this pathology, as it exists in Architecture (not Software or Enterprise Architecture, but real, putting-up-buildings Architecture), is this passage, about the Parthenon:
"The famous flutings on the famous columns---what are they there for? To hide the joints in wood--when columns were made of wood, only these aren't, they're marble. The triglyphs, what are they? Wood. Wooden beams, the way they had to be laid when people began to build wooden shacks. Your Greeks took marble and they made copies of their wooden structures out of it, because others had done it that way. Then your masters of the Renaissance came along and made copies in plaster of copies in marble of copies in wood. Now here we are making copies in steel and concrete of copies in plaster of copies in marble of copies in wood. Why?"
Unfortunately, this happens all too often in our world of technology. Consider, as an example, one great Bloated Behemoth Enterprise, whose technology needs were well in place and large-scale even thirty or forty years ago, in the days when Mainframes bearing Tape-drives ruled the earth. 

A computer-historical perspective is helpful here, and luckily TheHackerCIO has spent some time both talking to the old-timers (one goes to our local Users Group) and reading about the bad-old-days. In those ancient times, programs were structured around the tape based file system. A typical program would read, as input, the Customer Master tape, which contained an entry for every customer, and a second tape input -- let's say a New Orders tape -- containing a row for each new order needing processing, and already sorted by customer Id. 

The program, then would read a customer order, then advance the Customer Master file until it located this customer, pull out the data it needed to complete the order, and then record the finalized purchases in yet another tape. Note that when building a new system in this kind of ecosystem, one must always source things from existing files, stored on tapes. At the end of job-run, a new tape has been produced, which is input to the next job. And these jobs are all run on a particular schedule, carefully contrived, and supervised by the operators. 

Now enter the Relational Database. The point of this technological innovation, and the internal genius of its principles,  was to have a Master Database for the enterprise. Instead of files, all data would be stored in tables. Now, updates could be made transactionally to the system of record in real time, at the same time as others were querying that same data to determine how things were changing. To take our example, as new orders were placed, it was now possible to obtain the necessary data from the CUSTOMER Table, and create a row in a NEW_ORDERS Table to handle it. As the RDBMS evangelists put things, this tool allowed for:

  • reduced data redundancy
  • increased data availability
  • increased data security
And there was a whole methodology for properly "normalizing" the data, and consequently eliminating the double-update problem, and eliminating concern with determining the system-of-record, by replacing it with one universal system. 

Unfortunately, the Bloated Behemoth Enterprise dealt with Databases differently. They saw them as Yet-Another-Tape-Based-File System. 

And so, each new development project, at its commencement, began life by creating it's own database, just as they would have defined a new Tape-File layout. They sourced them by creating unload jobs, just as they would have unloaded a Tape and loaded the data into their new Tape File layout. They put new wine into old bottles.

And after thirty years of such accretions, they now have tens of thousands of databases, in every dialect of SQL possible, scattered across myriad platforms, with a tangled web of sourcing one database from the unloaded output of another, all kept in several "AutoSys" style batch-job schedulers, so that the proper, and necessary order of loading can take place. 

Which is to say -- for those unacquainted with this kind of BBE -- that at a particular time in the evening (typically midnight) the online systems are brought down, the databases are all quiesced, backups are taken, then crucial batch jobs run, many of which consist of unload jobs to extract data from one database, just as if it were a Tape, and load up another. 

As time has progressed, this batch window grows longer and longer, to progressively consume the evening -- not to mention the disk space available. 

This is a perfect example of the pathology of putting new technology into old bottles. 

And, en passant, it is an example of why Architecture must never adopt the "timeless (and thoughtless) way of building" that merely tinkers with using new things in the same old way, without spending the time to ensure that a proper understanding of the new way is properly adopted and promulgated. 

Consider this a Cautionary Tale! Always seek to know and find the inner logic of a new technology. Always seek to ensure that New wine get's the proper new bottle it needs. Otherwise, you'll want to get drunk when you see the results. 

I Remain,

TheHackerCIO



Monday, October 28, 2013

Software Architecture



Today, even the notion that there should be software architecture is up for debate! Regardless of what verdict one arrives at,  it is a good thing to examine whether something should exist prior to investing time studying it.

[editorial note: this post commences an occasional series on Software and Enterprise Architecture, in which I hope to collect all the principles into one convenient place. We begin at the beginning: should there even BE a discipline of software architecture? ]

There are a number of "nay-sayers." But I think they can all be classified as variants on the notion of "evolutionary design," which contrasts strongly with the notion of engineering, design, and deliberative architecture. The idea is that design is an emergent property that will manifest itself as the work unfolds. The Wikipeida article puts it this way:

Emergent design in agile software development[edit]

Emergent design is a consistent topic in agile software development, as a result of the methodology's focus on delivering small pieces of working code with business value. With emergent design, a development organization starts delivering functionality and lets the design emerge. Development will take a piece of functionality A and implement it using best practices and proper test coverage and then move on to delivering functionality B. Once B is built, or while it is being built, the organization will look at what A and B have in common and refactor out the commonality, allowing the design to emerge. This process continues as the organization continually delivers functionality. At the end of an agile release cycle, development is left with the smallest set of the design needed, as opposed to the design that could have been anticipated in advance. The end result is a smaller code base, which naturally has less room for defects and a lower cost of maintenance.[1]
As emergent design is heavily dependent upon refactoring, practicing emergent design without a comfortable set of unit tests is considered an irresponsible practice.[citation needed]
I have to laugh at the last line: "practicing emergent design without .. .[snip] ... is considered an irresponsible practice." It reminds me of the Moliere comedy, Le Bougeoise Gentilhomme, where the man finds he has been speaking prose all his life and didn't even know it. Likewise, many a team has been practicing "emergent design" for decades and thought they just hadn't used an architect.

There are a lot of parallels here with methodology pathology. Just adopting an agile or iterative methodology is not a silver bullet.  Process does not automate success! [The Rothering Principle] TheHackerCIO has seen plenty of projects with churning iterations that never seem to get anywhere; just as he has seen Waterfall projects that can't get out of one phase and into another, or that allow phase leakage. Process alone cannot ensure the absence of pathology. The Capability Maturity Model of Carnegie-Mellon seemed like a good thing, and I was a proponent of it, until I worked with outsourcing companies in a particular subcontinent (which will remain unnamed), all of whom were CMM level-5 certified, and none of whom could deliver one useable module of code to our project. I'd love to have seen a post-mortem analysis of how their coding failures made it back into the "feedback-loop" of their CMM program!

One manifestation of this attack is the TDD methodology. Taken rigorously, it directly assails the need for architecture: you are to take your assigned User Story from your Product Backlog, and after coding your first test case -- mind you, not one line of actual code has yet been written!!! -- after coding that test case, you now enter the "red" state, and are to determine ... what?
  • The best way to structure your code to allow for future eventualities? No! 
  • The best approach to code this method, so that you can reuse existing code? No!
  • The most elegant set of methods that will allow for maximal flexibility? No! 
  • A higher degree of generality to accommodate use cases that are going to come? No!
The principle adhered to here is YAGNI!

In contrast to attempting to architect, engineer, or design, one is supposed to take the directest, clearest, fastest path to implement the user story, at least a portion of the user story, and nothing but the user story. You are to "just do it," as the Nike slogan goes, and when your test-case passes, you enter the "green" state. At this point, you are to consider refactoring the code you just wrote, to achieve whatever an Architect or designer might have attempted upfront.

This is a direct assault on the notion of architecture, or on the need for it.  There are others.

Next up: Patterns and Pattern Languages

I Remain,

TheHackerCIO