Monday, October 8, 2012

Crazy Social Analytics for C# Nuget

@YaronNaveh

I'm happy to announce that C# nuget projects get some GitMoon love! This means you can get crazy social analytics about nuget projects. Check out your favorite ones:

SignalR
ServiceStack
Mono.Cecil
Facebook
Hammock
sqlite-net

Or check out famous head to head comparisons:

SignalR vs. ServiceStack
Mono.Cecil vs. LibGit2Sharp
sqlite-net vs. FluentMongo
TweetSharp vs. Facebook
Hammock vs. EasyHttp





@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Thursday, October 4, 2012

10 Caveats Neo4j users should be familiar with

@YaronNaveh

UPDATE: Michael Hunger from neo4j responds to some of my items in a comment.

Recently I used the Neo4j graph database in GitMoon. I have gathered some of the tricky things I learned the hard way and I recommend any Neo4j user to take a look.

1. Execution guard
By default queries can run forever. This means that if you have accidently (or by purpose) sent the server a long running query with many nested relationships, your CPU may be busy for a while. The solution is to configure neo to terminate all queries longer than some threshold. Here's how you do this:

in neo4j.properties add this:

execution_guard_enabled=true

Then in neo4j-server.properties add this:

org.neo4j.server.webserver.limit.executiontime=20000

where the limit value is in milliseconds, so the above will terminate each query that runs over 20 seconds. Your CPU will thank you for it!


2. ID (may) be volatile
Each node has a unique ID assigned to it by neo. so in your cypher you could do something like:

START n=node(1020) RETURN n

START n=node(*) where ID(n)=1020 return n

where both cyphers will return the same node.

Early on I was tempted to use this ID in urls of my app:

/projects/1020/users


This was very convinient since I did not have a numeric natual key for nodes and I did not want the hassle of encoding strings in urls.

Bad idea. IDs are volatile. In theory, when you restart the db all nodes may be assigned with different IDs. IDs of deleted nodes may be reused for new nodes. In practice, I have not seen this happen, and I believe that with the current neo versions this will never happen. However you should not take it as guaranteed and should always come up with your own unique way to identify nodes.

3. ORDER BY lower case
There is no build in function that allows you to return results ordered by some field in lower case. You have to maintain a shadow field with the lower case values. For example:

RETURN n.name ORDER BY n.name_lower

4. Random rows
There is no built in mechanism to return a random row.

The easiest way is to use a two-phase randomization - first select the COUNT of available rows, then SKIP rows until you get to that row:

START n=node(*)
WHERE n.type='project'
RETURN count(*)

// result is 1000
// now in your app code you make a draw and the random number is 512

START n=node(*)
WHERE n.type='project'
RETURN n
SKIP 512
LIMIT 1

An alternative is to use statistical randomization:

START n=node(*)
WHERE n.type='project' AND ID(n)%20=0
RETURN n
LIMIT 1

Where 20 is number you generated in your code. Of course this will never be fully randomized, and also requires some knowledge on the values distribution, but for many cases this may be good enough.


5. Use the WITH clause when cypher has expensive conditions
Take a look at this cypher:

START n=node(...), u=node(...)
MATCH p = shortestPath( n<-[*..5]-u) WHERE n.forks>20 AND length(p)>2
RETURN n, u, p

Here we will calculate the shortest path for all noes. This is a cpu intensive operation. How about separating concerns like this:

START n=node(...), u=node(...)
WHERE n.forks>20 AND length(p)>2
WITH n as n, u as u
MATCH p = shortestPath( n<-[*..5]-u ) WHERE length(p)>2
RETURN n, u, p

now the path is only calculated on relevant nodes which is much cheaper.


6. Arbitrary depth is evil
Always strive to limit the max depth of queries you perform. Each depth level increases the query complexity:

...
MATCH (n)<-[depends_on*0..4]-(x)
...

7. Safe shutdown on windows
When you run Neo4j on windows in interactive mode (e.g. not a service) do not close the console with the x button. Instead, always use CTRL+C and then wait a few seconds until the db is safety closed and the window disappears. If by mistake you did not safely close it then the next start will be slower (can take a few minutes or more) since neo will do recovery. In that case the log (see #8) will show this message:

INFO: Non clean shutdown detected on log [C:\Program Files (x86)\neo4j-community-1.8.M03\data\graph.db\index\lucene.log.1]. Recovery started ...

8. The log is your best friend
When crap hits the fan always turn out to /data/log. Especially if neo does not start you may find out that you have misconfigured some setting or recovery has started (see #7)

9. Prevent cypher injection
Take a look at this code:

"START n=node(*) WHERE n='"+search+"' RETURN n"

if "search" comes from an interactive user then you can imagine what kind of injections are possible. The correct way is to use cypher parameters which any driver should expose an api for. If you use the awesome node-neo4j api by aseemk you could do it like this:

qry = "START n=node(*) WHERE n={search} RETURN n"
db.query qry, {search: "async"}

10. Where to get help
The Neo4j Google group or the community github project are very friendly and responsive.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Wednesday, October 3, 2012

How I Built GitMoon

@YaronNaveh

I got some queries on how I built GitMoon so I decided to come up with this list in BestVendor:

How I Built a Viral Node.js App in Just One Weekend

You can read there about the technology and tool choices I've made and why. Got a cool cover image too. Check it out in BestVendor.


@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Tuesday, October 2, 2012

MongoDB and Redis go head to head with Node.js social analytics

@YaronNaveh

And not just mongo vs. redis but also jade vs. ejs, azure vs. jitsu and anything else you want! All in today's GitMoon new rollout. Here are some of the amazing geo-social visualizations you get when you compare two projects:






More visualizations are in GitMoon.

Check out some of the popular head-2-head comparisons:

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Thursday, September 27, 2012

GitMoon now has country, company drill down and some amazing graphs too

@YaronNaveh

Check out the great new stuff in GitMoon

Ever since I published GitMoon a couple of months ago I have been getting great feedback in twitter and mails. A lot of you also hinted on what you want to see next. So now is the time to thank you all for the feedback and also to show you what came out of it :)

Here's what has just been deployed to GitMoon:

Friends country / company drill down


Now when you're in the "users" tab you have an option to analyze the project users by country / company / project. So you can answer questions like "how many express.js users are from China" or "How many Yahoo employees use mongoose".

If you take a look at the side map you can also drill down into US states.


Dependency forced-directed graph
This amazing piece of d3.js magic shows you all the dependencies of a project, and their dependencies, all the way up. You can access it via the "projects" tab.


CodeBack drill down
CodeBack is one of gitmoon most useful features. Previously it was just one big list of all usages of the shown project. Now you can filter by the calling project, which makes it very useful for both module authors and consumers.


Amazing new landing page
The landing page is the project face so I decided to give it some SVG love.


Go have fun with your favorite projects!

If you love GitMoon please tweet to @YaronNaveh and your universe.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Tuesday, July 31, 2012

GitMoon is social analytics for Node.js open-source developers

@YaronNaveh


Check out GitMoon - social analytics for Node.js open-source projects!

Open source is fun. Sure, a lot of hard work is involved, but it's great to do something for the community! Now how frustrating is it to publish our project on the wild and never know who is using it (who as in face and picture)? Or to never know how successful our project is? Not to mention seeing how a typical usage looks like so we can improve the next versions.

Embracing GitHub
Github was an amazing game changer here. I love GitHub (Ben and Marc seem to concur) and use it a lot. I also predict great success for github enterprise in this era of IT consumerization: Open source developers are consumers and no old school CIO will tell them which scm to use. And here comes the but: Github is not perfect for the social and analytics needs of the community.

(tl;dr rant) What does it mean to watch a project in github? Does GH watch == Facebook like? Is it "I use it"? It sure is spamming my feed with every check in made to that project. I LIKE node.js but I can't WATCH it. Too much noise. Or let's talk analytics. The GH analytics module is very oriented to give the consumers visibility into how viable and live is this project. This is a great decision supporting tool for them. But let's not forget the project developer! We, developers, want to know who is using us. Who as in name and face. Social, you know. We want to know how successful our project is. How many people use it? How many projects depend on it? If 1 project depends on my project, and 10 projects depend on that one project, and 10 more projects depend on each one of them, then the way I see it the number of projects that depends on me is 1+10+10*10! Moreover, not only 5 users watch my project anymore, but hunderds of users watching every single project in the "network"!

Wouldn't it be cool to have visibility into all this?



So is it an ego thing? While there is nothing wrong with it, you should stay out of the github kitchen if you don't want anyone forking out with "your" stuff. But there's far more than ego here. You want to know how your project is being used so you can decide what's the next steps and next milestone priorities. How about flipping a journal with all the code excerpts that use your code? You'll love CodeBacks:


Ever wondered if your project needs to co-exist with fingernails toejam 2.0? Knowing what other libraries your project users employ together can hint you on your real testing priorities. Meet similar projects:


All this git
GitMoon embraces Github. To start with it uses the github excellent api. The first edition of GitMoon is node.js flavoured, so npm information is also used. Npm is an amazingly simple and gets-the-job-done package manager - you're now able to analyze it.

Not sure where to go next? Try async, mocha, mongoose, azure, ws.js or any of your favorite node projects in GitMoon.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Tuesday, June 19, 2012

Wcf and CoffeeScript, who said that opposites don't attract?

@YaronNaveh

At first look Wcf and CoffeeScript are very different: Wcf is chatty on the config side and bloated on the wire, CoffeeScript is just a "little language".

In second look... look at this:


So while Wcf still can't make coffee, CoffeeScript sure does make these fine custom bindings!

This magic is done via Wcf.js, the first ws-* implementation for Node.js. Wcf.js is written in pure coffeescript javascript. What's next, will Microsoft build a tablet?

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Sunday, June 10, 2012

12 common wcf interop confusions

@YaronNaveh

I get mails almost on a daily basis from people asking me how to build a Wcf client to consume a service of framework X (usually axis or wsit but others as well). After getting hundreds of these mails in the recent years I conclude that there is a single most common setting which most people need. There are also common confusions that a lot of people stumble on in their first try. In this post I will present the common setting, and what can (and will) go wrong.

The mails I get usually start with this soap sample which people want wcf to send:


Optionally ssl is also used.

The wcf setting required here is a custom binding with an authentication mode of "mutualCertificate":


(where https may be used instead of http)

Confusion 1: A wrong soap version by you can cause the server to return different kinds of exceptions. Make sure the "messageVersion" property on the textEncodingElement fits your needs. In particular if no ws-addressing headers are used (To, ReplyTo, MessageID and Action) then use "Soap11" (as above) or "Soap12" without any ws-addressing version.

Confusion 2: The proxy may throw this exception:

The client certificate is not provided. Specify a client certificate in ClientCredentials.

That's an easy one, you must confiugure a client certificate which is pretty basic for a signing scenario. You can do it from code or config. Here is the config version:


Confusion 3: You still get the same error after you have defined the certificate. In this case make sure you have configured the endpoint to use the behavior:


Confusion 4: When I use mutualCertificate authentication mode I define my client certificate. I do not have a server certificate to define. My proxy is not sending anything and throws this error:

The service certificate is not provided for target 'http://localhost/MyWebServices/Services/SimpleService.asmx'. Specify a service certificate in ClientCredentials.

The issue is that mutualCertificate always requires you to define a server certificate. In some cases you may not need it. In such cases it is ok to define some dummy certificate as the server certificate, even can be the same certificate you use for the client:


Of course you may also do so from code.

Confusion 5: You may get this error:

"The X.509 certificate CN=WSE2QuickStartServer chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.\r\n"

This typically mean the server certificate you have defined is not trusted by your machine. In the case that you have defined a dummy server certificate (see confusion 3) or in other cases - at your risk and for testing purpose only - you can turn off this validation by setting certificateValidationMode to None.


Confusion 6: I am getting a good response from the server but the proxy throws this exception:

The incoming message was signed with a token which was different from what used to encrypt the body. This was not expected.


Congratulations, turns out you need to define a real server certificate anyway (so confusion 2 does not apply). You should get it from the service author. But if you don't there a nice trick to infer the certificate by extracting the value of the binary security token from the message and saving it to disk (in binary form) as alluded here.


Confusion 7: I am getting a good response from the server but the proxy throws this exception:

Security processor was unable to find a security header in the message. This might be because the message is an unsecured fault or because there is a binding mismatch between the communicating parties. This can occur if the service is configured for security and the client is not using security.

This means the service is not signing the response even though you sent a signed request. In .Net 4+ you can turn off the secured response requirement by toggling the security channel in your custom binding:


Confusion 8: When I use mutualCertificate I see my proxy sends a message in a very different from what I need. In particular there is no signature but only encryption, something like this:


What you need to know is that by default messages will be signed AND encrypted, and moreover the encryption will also encrypt the signature and "hide" it from your eyes. The solution is to set the correct protection level on your contract:


btw while interoperating with some java stacks you will know you are in confusion #8 if you get this error:

General security error (WSSecurityEngine: No crypto property file supplied for decryption)

Confusion 9: After applying the mitigation to confusion 7 the outgoing message is still not in the desired format. In particular the message is not signed by the binary token by a derived token, and there is a primary and a secondary signature instead of just one:



For all things interop wssecurity10 is your friend and wssecurity11 is the enemy. keep your friends close! Make sure the messageSecurityVersion attribute has a value that starts with wssecurity10:


Confusion 10: You get this error :

Identity check failed for outgoing message. The expected DNS identity of the remote endpoint was 'localhost' but the remote endpoint provided DNS claim 'WSE2QuickStartServer'. If this is a legitimate remote endpoint, you can fix the problem by explicitly specifying DNS identity 'WSE2QuickStartServer' as the Identity property of EndpointAddress when creating channel proxy.

You fell for the oldest trick in the book! Just do exactly what the error tells you to do . Yes, it's ok...

Confusion 11: You get a good response from the server but the proxy throws this error:

No Timestamp is available in security header to do replay detection

or this one:

The security header element ‘timestamp’ with ‘Timestamp-xxxx’ id must be signed.

These may happen when you send to the server a signed timestamp so wcf expects to get one back AND to have it signed. So either you do not get one back or it is not signed. For start try to set the includeTimestamp property on the "security" binding element to false. But this will not work if the server actually requires a timestamp. If it requires one but unsigned then write a custom encoder to you proxy and manually generate and push the timestamp header to the request. If the server requires a signed timestamp then your only hope is to set allow unsecured response to true (.net 4 only):



AND to strip out ANY remains of the "security" tag from the response (not just the timestamp) using a custom encoder. If WCF will see the security tag then it will be very defensive and try to validate it. Of course if the security tag which you removed contains some signature this means you will not be able to validate it, which is a shame. I'm not familiar with any better workaround at this moment, so I'm investigating a few directions.

Confusion 12: Ssl is used, and you try certificateOverTransport instead of mutualCertificate authentication mode on your custom binding. You may get away with the request, since it is similar, but once the response come back you may experience:

Cannot find a token authenticator for the 'System.IdentityModel.Tokens.X509SecurityToken' token type. Tokens of that type cannot be accepted according to current security settings.

What's going on here? certificateOverTransport assumes the client authenticates with a message level certificate, but the server authenticates with its transport ssl certificate. However a more common use case is that the server also authenticates with a message level certificate, in addition to its transport one. You could identify such scenario by seeing a signature element in the server response. This means you need a mutualCertificate authentication mode together with an https transport binding element:


Summary
When Wcf consumes third party services, the most common authenticationMode would be "mutualCertifiate". Make sure you tried all combinations of this setting before trying other settings. Of course if you are in a situation where mutualCertificate clearly does not apply (e.g. username authentication) then this is not relevant for you. But even when usernames are used they may still be in combination with a client certificate, in which case it would still make sense to SecurityBindingElement.CreateMutualCertificateBindingElement() for bootstrap and add the username as a supporting token.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Thursday, June 7, 2012

Tesseract training cheatsheat

@YaronNaveh

As I wrote last time, tesseract-ocr is an open source ocr library originaly developed by
hp labs and today maintained by google. tesseract can be trained by you to support more languages and fonts. I have trained tesseract to read my hand writing and got success of over 90% - though this still means that once in every 10 characters or so there is an error. This page explains how you can train tesseract by yourself. This post will share some of the conclusions and pitfalls I have found from my experiment.





  • the tesseract training page is your friend. Follow the instructions!

  • The instructions work. Even if they are full of details and you are sure you (or the authors) got something wrong, remember that if you follow them carefully they will work for you.

  • One can't underestimate the important of a high quality picture for the ocr process. Use a good camera, make sure there is enough light. If you control the written text then a good paper and a thick pen are also helpful.

  • Scanner apps for mobile (CamScanneris my favirite) are critical, though they do not replace the need for a quality picture.

  • If you write on a blank paper, all text should be well aligned to virtual lines. Also no letter should stand out of the line (for example watch out not to write the letter 'p' too high over other letters.)

  • If you control the written text, consider to develop your own "font" so some of the ambiguous letters are really differentiated. For example I have decided to put an underline bellow all numbers and also under the letter n which can be mistaken for h or r.

  • I used the jTessBoxEditor for the box files. Its advantage over the other editors is that it supports multi page tiff files, which can be a good process to follow.

  • When you auto generate the box file the generated file may not identify some letters - that is in contrast of getting a letter wrong, it will not identify that there is a letter at all. From my experience there is no point in manually adding a box on that letter since it will never be identified. If too much letters are not identified you need to improve the quality of the photo or the ink and also make sure these letters are aligned in the same line as other letters.

  • if text lines start in the middle of a row, or if they are not nicely aligned one under the other, then there is a good chance tesseract will get them wrong.

  • sometimes tesseact got wrong the last line, but when I added a dummy line below it the real last line worked well.

  • when trying to recognice multi line text I got better resutls than when trying on a single line.

  • when trying on a single line I got better results when the image I used was not too large (so if the camera creats big pictures it is better to resize them)

  • when you create a box file make sure to use some existing language dictionary (if there is one) to bootstrp the identification. it does not matter which language you use since tesseract only uses it to generate the box file and it will not affect the final dictionary.

  • ImageMagick can be used to add some image to a multipage tiff file:

    convert.exe img1.bmp img2.jpg -adjoin res.tiff


    Common Errors



    Error: Illegal short name for a feature!
    signal_termination_handler:Error:Signal_termination_handler called:Code 2000

    I got this error after the .box file got corrupted for some reason. I have opened it and using "binary search" I deleted a different part of it every time and tried to build it again, until I found the wrong line. Typically the wrong line is because tesseract is identifying some very tiny dots as letters.

    Writing Merged Microfeat ...Warning: no protos/configs for { in

    CreateIntTemplates()
    Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:Error:Assert failed:in file ..\classify\intproto.cpp, line 1312

    As stated here, tesseract 3.0.1 only supports one image per font. It actually crashs when you try to use another image (exp2). you may want to use multipage tiff file if you need multiple images. this way you can always push more images to an existing font without loosing the previous coordinates. Generating a box file for the new tiff will override the existing one (which you have probably manually fixed) so I have built a utility to backup the previous one and copy all values from previous tiff pages to the newly generated box file.

    read_params_file: Can't open batch.nochop

    The Windows executable package does not include the configs. You will need to copy the 'tessdata' from the source distribution to the same directory as tesseract.exe to perform training (e.g. the source has two folder under tessdata which we need, configs, tessconfigs)

    tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 512 Segmentation fault

    You did not follow documentation - before unificying to traindata you need to:
    "All you need to do now is collect together all (normproto, Microfeat, inttemp, pffmtable) the files and rename them with a lang. prefix..."

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Monday, June 4, 2012

    Code Obscura: Executing pictures of code

    @YaronNaveh

    Pen and paper have always been loyal friends to human. Smartphones and tablets are trying to change this reality. Instead, they should embrace it!

    Take a look at the following concept video I made:



    (the beginning of the movie also introduces CamScanner by IntSig Information, a great scanner app)

    Cool! Is it useful?
    Suppose you're seating in a restaurant with some friends. Suddenly you're debating what's the 26th fibonacci number is. What's easier then taking out a piece of paper and writing this:


    What if you could pull out your smart phone, take a picture of the code, and then...this:


    Now assume you've just finished some design meeting in your day job and the white board is full of complex algorithms:


    Don't you wish you could just *run* them?

    There are plenty of other occasions where executing written code can come in handy like job interviews, university courses, and all things serendipity.

    One known limitation of my approach is that instagram photos are not supported. Live with that :)


    How does it work?
    First note that CodeObscura is a concept plus a prototype. For now it is not a product which you can install. I'd love to get your feedback on this concept.

    As you can see CodeObscura takes pictures of code and performs ocr on them. It then executes the recognized text as code on a node.js instance and reports back the results. Most of this takes place on a cloud - you just need to stay tuned near your mobile.

    The hardest part in implementing a product like this is to perform ocr on handwritten text. I have not seen a product or a library that does this in a rock solid manner. Fortunately, we do not need to recognize arbitrary text. We should be fine telling our users (us) to write the code using well separated letters and to be careful with some known ambiguous letters. Nevertheless ocr is still the achilles heel of the concept.

    Building your own CodeObscura
    About 95% percent of your time should be dedicated to the ocr part. I chose to use Tesseract - an open source ocr library originally developed by HP labs and today maintained by google. Tesseract will not identify your hand writing up front. You will need to train it. Since I've been training tesseract for some time now I know you would appreciate these tips:

  • the tesseract training page is your friend. Follow the instructions!

  • The instructions work. Even if they are full of details and you are sure you (or the authors) got something wrong, remember that if you follow them carefully they will work for you.

  • One can't underestimate the importance of a high quality picture for the ocr process. Use a good camera, make sure there is enough light. A good paper and a thick pen are also very important.

  • Scanner apps for mobile (CamScanner is my favirite) are critical, but do not replace the need for a quality picture.

  • If you write on a blank paper, all text should be well aligned to virtual lines. Also no letter should stand out of the line (for example watch out not to write the letter 'p' too high over other letters.)

    I promise to come up with a more massive tesseract cheatsheet soon.

    OCR-H - An ocr-friendly font for humans
    The problem of understanding hand written text needs to cope with many inherent ambiguities. OCR-A is a font invented in the late 60s to make it easy for ocr enabled devices to scan and understand text. This was a machine font so we cannot expect humans to follow it. To accelerate the recognition of hand written texts by commodity ocr libraries I have invented OCR-H - a "font" meant to be written by humans. Of course two letters written even by the same human are never the same, so OCR-H is more of a high level style and shape for characters to make them unique enough for a computer.

    OCR-H rules:

  • letters a-z are written as you typically write them
  • one exception is the letter n which is written with an underscore
  • all numbers are written with underscore as well as many punctuation signs
  • the signs ; + - * are written with a circle around them

    For example:



    This is just the first brush on ocr-h. I have found it to increase the success rate of commodity ocr libararies.

    Mobile app
    This part is pretty straight forward so I will not go into details. I used android, so the key part is to register the app for the "Send" event so that it appears in the list of options when you share a picture:





    then you can access the image when your activity starts like this:


    all you need to do now is to send the image to your cloud server as binary http payload and display the message you get back to the user.

    Server side
    I used a very basic node.js server side here. Not a lot to say about it except that at the moment it calls tesseract as a separate process which is not very scalable. Also eval() may raise some security concerns. You can see the rest here:


    Now what?
    Code Obscura already has a prototype I have written. It is pretty cool to take photos on a mobile phone - miles meters away from a PC - and execute them on the fly. Sure, there are a few humorous use cases for it, but I believe there's a real reason to take this idea a real step further. It is a fact that writing on a paper is much faster than typing on a mobile device. Combine that with a strong ocr library - I wonder if our next IDE will be a pen and a paper.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Saturday, May 26, 2012

    Wcf.js message level signature? Check.

    @YaronNaveh

    This is a very exciting moment for Wcf.js. It now supports one of the WS-Security most common scenarios - x.509 digital signatures. This is the first WS-Security implementation ever in javascript to support this. This implementation relies on xml-crypto on which I told you last time.

    Look at any of the following Wcf bindings:



    Assume only signatures are used (no encryption):


    Then a soap request would look like this:


    You can now interoperate with such services from javascript using Wcf.js with this code:


    Note that a pem formatted certificate needs to be used. Wcf likes pfx formats more, so check out the instructions here on how to do the conversion.

    You should also be aware that Wcf.js by default does no validate incoming signatures from the server. If you wish to validate them check out the sample here.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Sunday, May 20, 2012

    Xml-Crypto: An Xml digital signature library for Node.js

    @YaronNaveh

    Get xml-crypto on github

    Node.js does not always have the right libraries for Xml operations. When such libraries exist they are not always cross platform (read: work on windows). I've just published xml-crypto, the first xml digital signature library for node. As a bonus this library is written in pure javascript so it is cross platform.

    What is Xml Digital signature?
    There's a tl;dr version here. The essence is that dig-sig allows to protect content from unauthorized modification by telling us who created that content and if anyone had altered it since. Xml dig-sig is a special flavour which has some interesting implementation aspects.

    A typical xml signature looks like this:


    Installing Xml-Crypto

    Install with npm:

    npm install xml-crypto

    A pre requisite it to have openssl installed and its /bin to be on the system path. I used version 1.0.1c but it should work on older versions too.

    Signing an xml document

    Use this code:


    The result wil be:


    Note:

    sig.getSignedXml() returns the original xml document with the signature pushed as the last child of the root node (as above). This assumes you are not signing the root node but only sub node(s) otherwise this is not valid. If you do sign the root node call sig.getSignatureXml() to get just the signature part and sig.getOriginalXmlWithIds() to get the original xml with Id attributes added on relevant elements (required for validation).

    Verifying a signed document

    You can use any dom parser you want in your code (or none, depending on your usage). This sample uses xmldom so you should install it first:

    npm install xmldom

    Then run:


    Note:

    The xml-crypto api requires you to supply it separately the xml signature ("<Signature>...</Signature>", in loadSignature) and the signed xml (in checkSignature). The signed xml may or may not contain the signature in it, but you are still required to supply the signature separately.

    Supported Algorithms

    The first release always uses the following algorithems:

  • Exclusive Canonicalization http://www.w3.org/2001/10/xml-exc-c14n#
  • SHA1 digests http://www.w3.org/2000/09/xmldsig#sha1
  • RSA-SHA1 signature algorithm http://www.w3.org/2000/09/xmldsig#rsa-sha1

    you are able to extend xml-crypto with further algorithms. I will author a post about it soon.

    Key formats

    You need to use .pem formatted certificates for both signing and validation. If you have pfx x.509 certificates there's an easy way to convert them to pem. I will author a post about this soon.

    The code

    Get xml-crypto on github

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Saturday, May 5, 2012

    How to fix Wcf cache of dynamic Wsdls

    @YaronNaveh

    One of the least used Wcf extension points is IWsdlExportExtension. This extension allows to customize the wsdl document which Wcf emits. Since you rarely want to do that, this extension is not commonly used. When it is already used it is usually in the context of flattening the wsdl. A different use case I have recently seen is to push dynamic content into the wsdl. More specifically a user was trying to generate xsd schemas from a live database table and to put it to the wsdl so clients would always get the latest schema. The Wcf service itself was treating the request as Xml anyway so it did not care for such changes. The requirement was for the wsdl to reflect the latest db changes at any time. Our problem was that once the wsdl was generated for the first time it would not be regenerated. This resulted in a stale schema.

    This is how we created the wsdl exporter:


    When we run this service and open the wsdl we get this:


    When we refresh the wsdl after a few seconds we still get this:


    This is not a browser or proxy cache. Wcf does not recreate the wsdl - which can also be seen by putting a breakpoint (which is only called once) on the exporter.

    This behavior makes since when you consider the case where there is no importer extension - then the wsdl is generated based on the data contract assembly, and as long as that assembly does not change the wsdl will not also. However we have chose to put dynamic logic in ExportEndpoint method so that default behavior did not work well for us.

    One way to fix that is to use a message inspector to update the wsdl before it is sent to the client. In this case IWsdlExportExtension is not required at all. This approach is described here.
    An alternative could be to build a Wcf rest endpoint in the same service to act as a proxy to the real wsdl.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Tweets of the week: It's DevOps Borat

    @YaronNaveh

    Borat likes DevOps:



    Soap / Rest / Wsdl still a hot topic:





    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Friday, April 27, 2012

    Tweets of the week: A Wcf love/hate thing

    @YaronNaveh

    Since I opened my twitter account I've seen a few witty fights insights on web services, so here I share them. If you stumble upon anything worth to get into my next list let me know.

    Soap vs. Rest reloaded?




    Wcf love/hate thing:




    Asp.Net Web API to the rescue:



    Wsdl confessions:




    If you see anything worth to put in my next list let me know.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Declaratively ignoring must understand headers

    @YaronNaveh

    A soap header may specify a "must understand" flag. This instructs any processing node to throw an exception if this header is not understood by it. Such a behavior is sometimes useful and sometimes very annoying, depending on the circumstances. Let's see how such header looks like in soap:


    By default a Wcf service will validate all incoming mustUnderstand headers a client sends. If it does not understand them it will throw the famous 'Did not understand "MustUnderstand" header' exception. Typically you would instruct Wcf not to validate these headers like this:


    But this kind of "hard codes" this behavior to the service. Wouldn't it be nice to decide at the configuration level if we want such a behavior or not?

    All we need to do is define this class:


    Then in the config register it:


    And we can now configure our endpoint(s) with this behavior:

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Wednesday, April 25, 2012

    When EnableUnsecuredRespose *requires* an unsecured response

    @YaronNaveh

    A few weeks ago I had to call a legacy wse2 service from a Wcf client. The service behavior was:

  • Request must be encrypted and signed at the message level
  • Request must contain a timestamp inside the security header
  • Response is neither encrypted nor signed
  • Response nevertheless contains a timestamp inside a security header

  • You might think that dismissing the signature requirement from the response would do good for interoperability - after all this is less work. However this time less was more. Turns out that Wcf loves symmetry and does not encourage messages in one direction to be signed and in the other direction to be clear. But hey! This complaint is so WCF 3.5. In 4.0 we got the goodie of EnableUnsecuredResponse:


    When this setting is on Wcf should be ok with an unsigned response. But in my case even with this flag I was still getting this error:

    The security header element ‘timestamp’ with ‘Timestamp-xxxx’ id must be signed.

    As you remember the service returns an unsigned timestamp element. Turns out we have this chain of rules:

    request contains a timestamp and has some signature requirement -->
    the timestamp is always signed (even if we do not wish that) -->
    the response must contain a signed timestamp unless EnableUnsecuredRespose in on. In that case timestamp is optional, but if present it must be signed.

    So I had to find a way to remove the timestmap from the response. Since the service could not be changed I used my good old friend the custom encoder.

    But even after that I got this error:

    The 'body', 'http://schemas.xmlsoap.org/soap/envelope/', required message part was not signed.

    So WCF was still looking for some ws-security goodies. To solve this I had to remove the security element all together from the response. Here is the snippet I added to the encoder:


    Many times removing the security element at all exposes us to some risks like replay attacks or a man in the middle. However here we knew up front that the service does not use any interesting security features in the response so there was no regression.

    Conclusion
    EnableUnsecuredRespose will allow us not to have a security element in the response even if the request has it. But if the response contains a security element nevertheless, then wcf will take it seriously and if it does not comply with the expected requirements the interaction will fail.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Saturday, April 21, 2012

    Ws.js - A ws-* implementation for node.js

    @YaronNaveh

    (Get Ws.js on github)

    Some time ago I introduced Wcf.js - a wcf-inspired client soap stack for node.js. However Wcf.js itself is a small wrapper on top of Ws.js - the ws-* implementation for node.js. You got it right: Your node.js apps can now access your core services using various ws-* standards. No more proxies for "protocol bridging", no more service rewrites.

    Get it here.

    Here is a quick sample on what we can do with Ws.js:


    The above example adds a username token to the soap. The output soap will be:


    For detailed usage instructions check it out on github.

    Ws.js currently supports:

  • MTOM
  • WS-Security (username tokens only)
  • WS-Addressing (all versions)
  • HTTP(S)

  • Coming up next is probably deeper ws-security support including x.509 certificates encryption and signature. Needless to say that any capability added to ws.js will also apply to wcf.js.

    Here is the project page on github.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Sunday, April 15, 2012

    I'm on Twitter: @YaronNaveh

    @YaronNaveh

    I just started my twitter account - @YaronNaveh (yeah I'm a late bloomer). I'l be mostly writing about the stuff I love (node.js, wcf, web services) but expect some new stuff too! See you there.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Sunday, April 8, 2012

    WCF users voice survey

    @YaronNaveh

    Carlos shares the WCF UserVoice channel. That's a great chance for all of us to influence the next version of WCF. If you're a WCF user - go vote.

    My votes go to open source WCF. Like many of the .Net libraries WCF classes become sealed exactly where I want to change their behavior. This is especially true for all things security. Sometimes it is really important to tweak the way WCF signs a message or handles a signed one but today this requires to implement a very long chain of extensions.

    The current predominant features also include support for REST in the routing service and web sockets support on earlier windows versions. So far the traditional request to simplify configuration and bindings is not too visible, possibly because the WCF simplification features have done a great deal here (or because people are more focused on Rest these days).

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Friday, March 30, 2012

    Wcf to WebSphere interop: ActivityId is not protected

    @YaronNaveh

    I recently had to call a secured WebSphere service from a Wcf client. Fine tuning all the settings was challenging so I turned on Wcf tracing. The latter gave me detailed errors which helped me to see where I was wrong. But after fixing everything I knew of, I got this new error:

    An element, ActivityId, with nsuri, http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics, is not protected by signing algorithm, sha1, and digest algorithm Rejected by filter; SOAP fault sent

    Having no idea where this ActivityId element comes from , I took a quick look at the message my Wcf client was sending:

      <s:Header>
        <ActivityId CorrelationId="db5feb51-ae82-4c1b-bd68-1bdb2d09bbc6" xmlns="http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">f5aada53-669f-46d7-acc5-8d45e437ed86</ActivityId>
      </s:Header>


    Where does this ActivityId come from?
    Turns out this header is emitted by Wcf when we turn tracing on. Wcf uses this header to color the message as it flows between various layers so it can later show a single view of it.
    In my case the WebSphere expected clients to sign all headers. This particular header was not signed (since wcf tracing just adds it as is) so WebSphere complained about a policy violation.
    After turning off the Wcf trace settings the integration worked like a charm. So you can say the whole issue was kind of a drug side affect.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Thursday, March 29, 2012

    Xml stack for node.js that works on windows

    @YaronNaveh

    When I developed wcf.js I extensively used xml operations. Finding the right libraries was not an easy task so I thought to share my findings here. My requirements were to use dom style xml parsing and that the whole stack will be multi-platform (read: work on windows). It turned out that there are many libraries that fulfill one of these requirements but it was very hard to find one which fulfills both. Then I wanted to run xpath operations on the dom. And again I needed a library that works on windows and integrates well with the former dom parser.

    I started my journey with googling for "node.js xml parser". I immediately found node-xml which is a pure javascript sax parser. Finding other sax parsers was also easy but that was not what I had in mind. I then moved to "node.js xml dom". This actually led me to the main listing of node libraries sorted by category, and I immediately turned to the xml section. I felt like I was drinking from the firehose: Over 15 xml parsers were listed. It was very disappointing to find out that most of them are based on libxml2 which means they will work on windows only via cygwin. That's evil.


    xmldom
    Just before I started to roll my own xml parser I have found xmldom. Xmldom is a pure javascript implementation of dom (and sax) which makes it fully portable to any environment.

    xpath.js
    I also needed an xpath engine. Finding one that is cross platform was not an easier task. I have finally found xpath.js. The latter was actually not written as a node.js module (it dates back to 2006) but it was fairly easy to migrate it there. As you can see here, I just added to it this method in the end:

    function SelectNodes(doc, xpath)

    {
      var parser = new XPathParser();
      var xpath = parser.parse(xpath);
      var context = new XPathContext();
      context.expressionContextNode = doc.documentElement;
      var res = xpath.evaluate(context)
      return res.toArray();
    }

    exports.SelectNodes = SelectNodes;

    Making it all work together
    The following sample shows how to parse an xml document and match an xpath on it.
    Note you should include the updated xpath.js as part of your project (e.g. in /lib) and the second line in the sample should reference that path. You should also install xmldom using  npm install xmldom.

    var select = require('./xpath').SelectNodes   //the path to xpath.js in your project
      , Dom = require('xmldom').DOMParser

    var doc = new Dom().parseFromString('<x><y id="1"></y></x>')
      , res = select(doc, "//*[@id]") //select all nodes that has an "id" attribute

     if (res.length==1)
      console.log(res[0].localName); //prints "y"

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Saturday, February 25, 2012

    Wcf and Node.js, better together

    @YaronNaveh

    (get wcf.js on github!)

    Take a look at the following code:


    Do you see anything...um, special? Well c# already has the "var" keyword since version 3.0 so maybe it is some kind of a c#-ish dialect? Or maybe it is a CTP for javascript as a CLR language? Or something related to the azure sdk for node.js?

    Not at all. This is a snippet from wcf.js - a pure javascript node.js module that makes wcf and node.js work together!

    As node assumes its central place in modern web development, many developers build node apps that must consume downstream wcf services. Now if these services use WCF Web API ASP.NET Web API it is very easy. It is also a breeze if you are in a position to add a basic http binding to the Wcf service, and just a little bit of more work if you plan to employ a wcf router to do the protocol bridging. Wcf.js is a library that aims to provide a pure-javascript development experiece for such scenarios.

    Note that building new node.js ws-* based services is a non-goal for this project. Putting aside all the religious wars, Soap is not the "node way", so you should stick to Rest where you'll get good language support (json) and built-in libraries.

    "Hello, Wcf... from node"

    You are closer than you think to consume your first Wcf service node.js:

    1. Create a new wcf web site in VS and call it "Wcf2Node". If you use .Net 4 than BasicHttpBinding is the default, otherwise in web.config replace WsHttp with BasicHttp. No need to deploy, just run the service in VS using F5.

    2. Create anywhere a folder for the node side and from the command line enter its root and execute:

    $> npm install wcf.js

    3. In the same folder create test.js:

    var BasicHttpBinding = require('wcf.js').BasicHttpBinding
      , Proxy = require('wcf.js').Proxy
      , binding = new BasicHttpBinding()
      , proxy = new Proxy(binding, " http://localhost:12/Wcf2Node/Service.svc")
      , message = '<Envelope xmlns=' +
                '"http://schemas.xmlsoap.org/soap/envelope/">' +

                     '<Header />' +
                       '<Body>' +
                         '<GetData xmlns="http://tempuri.org/">' +
                           '<value>123</value>' +
                         '</GetData>' +
                        '</Body>' +
                   '</Envelope>'

    proxy.send(message, "http://tempuri.org/IService/GetData", function(response, ctx) {
      console.log(response)
    });


    4. In test.js, change the port 12 (don't ask...) to the port your service runs on.

    5. Now we can execute node:

    $> node test.js

    6. You should now see the output soap on the console.

    Of course this sample is not very interesting and you may be better off sending the raw soap using request. Let's see something more interesting. If your service uses ssl + username token (transport with message credential), the config may look like this:

    <wsHttpBinding>
        <binding name="NewBinding0">
            <security mode="TransportWithMessageCredential">
                <message clientCredentialType="UserName" />
            </security>
        </binding>
    </wsHttpBinding>

    The following modifications to the previous example will allow to consume it from node:

    ...
    binding = new WSHttpBinding(
            { SecurityMode: "TransportWithMessageCredential"
            , MessageClientCredentialType: "UserName"
            })
    ...

    proxy.ClientCredentials.Username.Username = "yaron";
    proxy.ClientCredentials.Username.Password = "1234";
    proxy.send(...)

    And here is the wire soap:

    <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">

    <Header>
      <o:Security>
        <u:Timestamp>
          <u:Created>2012-02-26T11:03:40Z</u:Created>
          <u:Expires>2012-02-26T11:08:40Z</u:Expires>
        </u:Timestamp>
        <o:UsernameToken>
          <o:Username>yaron</o:Username>
          <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">1234</o:Password>
        </o:UsernameToken>
      </o:Security>
    </Header>

    <Body>
      <EchoString xmlns="http://tempuri.org/">
        <s>123</s>
      </EchoString>
    </Body>


    If you use Mtom check out this code:

    (The formatting here is a bit strage due to my blog layout - it looks much better in github!)

    var CustomBinding = require('wcf.js').CustomBinding

      , MtomMessageEncodingBindingElement = require('wcf.js').MtomMessageEncodingBindingElement

      , HttpTransportBindingElement = require('wcf.js').HttpTransportBindingElement

      , Proxy = require('wcf.js').Proxy
      , fs = require('fs')
      , binding = new CustomBinding(
            [ new MtomMessageEncodingBindingElement({MessageVersion: "Soap12WSAddressing10"}),
            , new HttpTransportBindingElement()
            ])

      , proxy = new Proxy(binding, "http://localhost:7171/Service/mtom")
      , message = '<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope">' +
                    '<s:Header />' +
                      '<s:Body>' +
                        '<EchoFiles xmlns="http://tempuri.org/">' +
                          '<value xmlns:a="http://schemas.datacontract.org/2004/07/" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">' +
                            '<a:File1 />' +                   
                          '</value>' +
                        '</EchoFiles>' +
                      '</s:Body>' +
                  '</s:Envelope>'

    proxy.addAttachment("//*[local-name(.)='File1']", "me.jpg");

    proxy.send(message, "http://tempuri.org/IService/EchoFiles", function(response, ctx) {
      var file = proxy.getAttachment("//*[local-name(.)='File1']")
      fs.writeFileSync("result.jpg", file)    
    });

    Mtom is a little bit trickier since wcf.js needs to know which nodes are binary. Using simple xpath can help you achieve that.

    Getting your hands dirty with Soap
    Wcf.js uses soap in its raw format. Code generation of proxies does not resonate well with a dynamic language like javascript. I also assume you are consuming an existing service which already has working clients so you should be able to get a working soap sample. And if you do like some level of abstraction between you and your soap I recommend node-soap, though it still does not integrate with wcf.js.

    If you will use raw soap requests and responses you would need a good xml library. And while node has plenty of dom / xpath libraries, they are not windows friendly. My next post will be on a good match here.

    Supported standards
    Wcf implements many of the ws-* standards and even more via proprietary extensions. The first version of wcf.js supports the following:

  • MTOM

  • WS-Security (Username token only)

  • WS-Addressing (all versions)

  • HTTP(S)

    The supported binding are:


  • BasicHttpBinding

  • WSHttpBinding

  • CustomBinding

    What do you want to see next? Let me know.

    Get the code
    Wcf.js is hosted in GitHub, and everyone is welcome to contribute features and fixes if needed.
    Wcf.js is powered by ws.js, the actual standards implementation, which I will introduce in an upcoming post.
  • @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!