Skip to main content

Debugging MapReduce in Javascript

Cross post from my employer's development blog: http://rootinc.github.io/2016/02/03/debug-mapreduce-js/

With single line arrow functions in ES6, it’s getting easier and cleaner to use JS’s native Array.prototype.map() and Array.prototype.reduce() functions for molding our data. I recently decided to go the MapReduce route for a project to filter some data upfront. Inevitably, I ran into some issues with my logic and wasn’t getting what I expected. Thing is, there doesn’t seem to be a simple way to add debugging to a chain of map() and reduce() calls, particularly if you’re using single line arrow functions.

     let behaviors = track.metrics
       .reduce((list1, m) => list1.concat(m.behaviors), [])
       .reduce((list2, b) =>
         ((list2.indexOf(b) === -1 && list2.push(b)) || 1) && list2
       , [])
       .map(bId => json.behaviors[bId]);

Here I need to take multiple arrays of id’s, flatten them, remove duplicates, then map in their respective object for the given id. I had an issue with the final value at first (note, above code is correct) and wanted to debug the values between each method call in the chain.

How do I do this? Well, very rough approach would be to switch to bracketed arrow functions, which will allow me to run two statements in the function.

     let behaviors = track.metrics
       .reduce((list1, m) => {
         console.log(list1);
         return list1.concat(m.behaviors)
       }, [])
       .reduce((list2, b) => {
         console.log(list2);
         return ((list2.indexOf(b) === -1 && list2.push(b)) || 1) && list2
       }, [])
       .map(bId => json.behaviors[bId]);

This works but suffers a couple drawbacks. You lose the syntactic conciseness of the implicitly returning no bracket arrow functions, resulting in what looks like a borderline ES5 mess. Second, I get way too many calls to console.log as the functions get run for every item in the array. So I would have to add additional logic to each function to prevent that. I’m not even going to bother typing that all out here.

So, I wrote up a rather simple factory function to use in this case.

function MRDbg(...txt){
 return function(prev, cur, i, arr){
   if (i === 1){
     console.log.apply(console, [...txt, arr]);
   }
   return arr;
 }
}

It expects to be called as a reduce() handler and allows you to label each debug call. So it fits nicely into my current MapReduce chain, and, here’s a big one, I don’t have to modify my existing functions and then worry about changing them back later!

let behaviors = track.metrics
 .reduce((list1, m) => list1.concat(m.behaviors), [])
 .reduce(MRDbg(‘first’))
 .reduce((list2, b) =>
   ((list2.indexOf(b) === -1 && list2.push(b)) || 1) && list2
 , [])
 .reduce(MRDbg('second'))
 .map(bId => json.behaviors[bId]);
});

Much cleaner! We then get some very simple console outputs.

first [140, 141, 142, 143, 144, 132, 145, 146, 147, 148, 149, 150, 151, 152, 153, 87, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 100, 178, 89, 179, 101, 102, 98, 176, 177, 180, 100, 178, 89, 179, 101, 181, 102, 182, 140, 141, 142, 144, 145, 146, 147, 148, 149, 183, 150, 152, 184, 87, 154, 155, 158, 159, 160, 110, 185, 163, 112, 164, 166, 167, 168, 169, 170, 171, 173, 186, 174, 175]
main.js:115
second [140, 141, 142, 143, 144, 132, 145, 146, 147, 148, 149, 150, 151, 152, 153, 87, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 100, 178, 89, 179, 101, 102, 98, 180, 181, 182, 183, 184, 110, 185, 112, 186]

You can find a gist of the function here along with an ES5 compatible version.

Comments

Popular posts from this blog

Quick Deepstream.io Setup Using JSPM

Cross post from my employer's development blog: http://rootinc.github.io/2016/02/12/deepstream-jspm/
Want to use JSPM rather than Bower for running the Deepstream.io example? Follow these steps. This is basically a duplicate of the [Getting Started tutorial][tutorial] on the [Deepstream.io website][website] but using a really simple JSPM setup. This is a very crude guide where I list everything I had to do to get things running.

Create an empty project folder npm install deepstream.io Copy server code verbatim from the Getting Started guide jspm install npm:deepstream.io-client-js Hit enter for all the prompts from JSPM

We’re going to modify the client side code a little bit. We have native support for ES6 compiling with JSPM/Babel so we can import the Deepstream client directly:

import deepstream from 'deepstream.io-client-js'; let ds = deepstream( 'localhost:6020' ).login();

let record = ds.record.getRecord( 'someUser' );

let input = document.querySelector( 'inp…

Atari E3 2004 PAL digital press kit

Making note of some old swag. The Atari E3 2004 PAL digital press kit. See video for details.






Accessing other HTTP servers on Cloud 9 IDE

If you're using Cloud 9 to do development, you'll quickly realize that only ports 8080 through 8082 are available to the outside world from your development box. This is generally not an issue as you can set your application to bind to the $PORT environment variable when in development mode. However, there are sometimes other servers that we want to make use of that host on different default ports.

I recently had to setup a Neo4j server which defaults the admin interface of port 7474. Unfortunately, I could not access the admin interface even through the IDE based web browser window. So, what to do? I could change the default server settings so that it runs on a different port. However, the app I'm working on with a team has 7474 hard-coded and I currently don't feel like writing a local only work-around.

After some searching, I ran across a neat Linux tool called socat. This allows us to easily forward one port to another. After a quick install via apt-get, I ran the …