Open source bargain
This blog started as a Twitter conversation between me and @realjenius about Ehcache but I needed a little more room to make my point.
Ehcache is the most widely used open source Java caching library. Terracotta (my former employer) bought Ehcache last year and I was intimately involved in tending it while I worked at Terracotta … just to make my relationship to this clear.
Periodically someone notices that Ehcache pings a Terracotta server when it is instantiated and periodically thereafter and sends it the following information:
- Operating system name (os.name)
- Java VM name (java.vm.name)
- Java version (java.version)
- Platform (os.arch)
- Terracotta version (if applicable)
- Terracotta product name/version (if applicable)
- Uptime
- Hash of IP address – used as a fingerprint for correlation
The Terracotta update check server sends back information about whether a newer version of Ehcache exists. If so, a message is displayed to the console.
This of course is no different in operation than what most of the other software on your desktop or most open-source app servers do these days. I’ll admit it’s a little unusual for a non-server library to perform this kind of check but I think that line is pretty gray if you ponder it for a minute.
Generally, people seem upset when they find this out and feel like the library is spying on them or in some way intruding on their application. I’ve listed above the information that is actually being sent and it’s nothing nefarious (being open source, you’re welcome to peruse the code yourself).
The update check is made in a separate background thread – it will time out if there is no response due to network setup and it safely handles conditions where thread creation is not allowed (Google App Engine most notably). You can turn the update checker off either in the Ehcache xml configuration (with updateCheck="true" on the root <ehcache> element), or programmatically if you dynamically create caches, or VM-wide with the system property -Dnet.sf.ehcache.skipUpdateCheck=true.
Terracotta expects and recommends that any production deployment of Ehcache will turn off the update check, just as it would likely turn off the update check in Glassfish, or any other such software.
I think it might be helpful to consider why Terracotta/Ehcache would want such a check in the first place. This information tells Terracotta how Ehcache is being used as a metric of adoption. That information can be fed through the marketing and business sides of Terracotta. Those numbers let Terracotta convince investors that people use the library and consequently get funding to pay the salaries of the world-class team at Terracotta and the machines in the giant perf lab that makes Ehcache awesome.
The information about Ehcache versions and OS/JVM environments tells Terracotta how to place emphasis during QA. If 80% of users run on Linux, then it makes sense to focus testing efforts on that platform. Similarly, if a small but significant number are running JDK 1.5 then that might keep it in the QA and support matrix for longer. Again, this information lets Terracotta put limited financial resources to the most efficient use to make Ehcache awesome.
While these features may initially feel intrusive, I think on some reflection that they are not really doing anything evil or scary, that they are easy to turn off, and that while they provide value to Terracotta, they also provide value to the user, both in version information in the short term and in an awesome product in the long term.
Sometimes I think people underestimate the amount of engineering work that goes into an open source product like Ehcache or Terracotta or Quartz, especially one backed by an actual company. Terracotta as a company employs a team of a couple dozen people who are creating truly world-class products, equal in innovation and quality to any number of commercial, non-open source, non-free products. But to make that work financially, there must be some part of the products that actually provides revenue. Small things like an update check actually make a big difference on the business side of the equation, both in growth and efficiency. I think if you consider it in those terms, you’ll find that the trade-off of information for engineering value is still weighted heavily in favor of the user.
The first suggestion people always make about the update check is why can’t it default to off? The answer to that should be obvious – no one would take extra steps to turn it on and provide that information (even if it is harmless). There is no point in having the update check code if it is not on by default. You are welcome to read that rationalization as evil if you want, but I think that’s naive.
Google shows you ads when you use their free search engine – this trades your attention (and occasional clicks) for a valuable service. Any other Internet “free” service is asking you to participate in a trade of something (often attention to ads or personal information) in exchange for a valuable “free” service. The Ehcache update check is really nothing different – an exchange of information for free use of a great piece of software. I personally see no moral issue with making an exchange of this information for great (free) software – seems like a bargain to me.

Hi! My name is Alex Miller and I live in St. Louis. I write code for a living and currently work for
I’d rather see a notification that the info is sent and give you the option to cancel or turn it off. I have no problem with opting in being the simplest path. See how Eclipse collects metrics for a good example.
@Heath: I think that’s a reasonable option to talk about. Although personally I wish Eclipse would just shut the heck up about collecting usage info and do it.
@Heath: Ehcache isn’t a GUI app to be able to go that route.
Alex,
I appreciate the thorough explanation of why EhCache has the update checker, and I can certainly empathize with the honest and legitimate reasons for its existence.
One of the things that makes this a bit shocking initially is the fact that this is something that is traditionally not done by system libraries like EhCache. I don’t know of any other cases. Now, I certainly don’t claim any ‘big brother’ concerns from it, however I can imagine some of our vendors being confused/concerned that our application is trying to connect to some remote server. I can appreciate the comparison to applications (like Eclipse/Intellij) and servers (like Glassfish), and certainly they have a lot of precedent; however it is most decidedly not apples to apples. EhCache is not a server, nor is it an application. It’s a part of the machine, not the machine itself.
From a technical standpoint, one way to see the distinction is that EhCache is borrowing resources from my application’s runtime (not from its own) to phone home (altruistic reasons, or not).
As I mentioned over Twitter, we use a lot of libraries in my application under a variety of compatible open source libraries; it would be more than a headache (and a large waste of bytecode) if all of those libraries attempted something like this. We’d be forced to patch/hack/abandon them. I am thankful that EhCache has a reliable, likely-bug-free, and unobtrusive update checker that can be turned off. However, even in that case, it feels like an over-reach for the library itself to be doing something like this, no matter how much engineering has gone into it. Bugs like this show how something as simple as an update-checker can cause undesired side-effects: https://jira.terracotta.org/jira/browse/EHC-601
As developers of a product at my company, we have a time for evaluating our library versions and the usage of libraries: during development and release engineering for the versions of /our/ application, not during application execution on our clients environments. From our perspective, this is doing this check at the wrong time – running every time my product starts feels inconsistent with what is really trying to be achieved from a user standpoint – letting me know that there is a newer/stronger/better/faster version of the library. That’s why so many companies offer newsletters, RSS feeds, and otherwise. From a user’s perspective, if I’m interested in the news that there is a new version, I get it at an expense of my choice.
Admittedly, those schemes don’t help Terracotta get their information about their user-base, which leads to your next point:
> The first suggestion people always make about the update check is why can’t it default to off? The answer to that should be obvious – no one would take extra steps to turn it on and provide that information (even if it is harmless). There is no point in having the update check code if it is not on by default. You are welcome to read that rationalization as evil if you want, but I think that’s naive.
You are correct that /all/ opt-in forms of communication abandon a lot of the user-base, but there is a reason most companies still choose opt-in forms of communication over opt-out. It is obtrusive to force a user to opt-out. It bothers me (and I don’t think I’m alone in this) when I have to opt-out of an email newsletter when I download a version of an open-source product (like Jasper Reports); why shouldn’t it bother me to have to tell my application to opt-out progammatically/declaratively of a phone-home update-check. Especially if I upgraded (or Maven upgraded) from a version with no update-check to one that /did/ have an update-check. Now I’m forced to either change the startup script for my application or configuration file to shut this aggressive new feature off (or roll-back, of course).
I appreciate Terracotta’s position, but I think there are less obtrusive ways to achieve the goals of notifying users of new versions and still getting decent metrics about usage of the product.
I certainly value your opinion; we may just have to agree to disagree on this one.
@R.J.: I hear ya. I think your comments are valid – we had many of these same discussions at Terracotta before release. Rather than “agree to disagree”, I’d rather think creatively about ways to satisfy business requirements and still make users happy. That to me is a productive way to move forward. I’m not at Terracotta anymore so I can’t speak for them but I don’t believe anyone there is entirely happy with the current state as the ultimate solution, just the best one we could come up with. As engineers, we’re always willing to think about a better way to build things.
Yeah, I don’t mind that type of data gathering. Seems pretty harmless to me as long as they keep at that and confidential. And of course, turn it into more open source applications that we can use at our computers.
@R.J.
You didn’t provide any alternative solutions. I’d be curious to hear what other solutions you think exist?
Hey Guys,
I think the information being pulled is harmless but in the end I dont think that is the actual point. It is information that the user does not know is being sent out and collected. I think the problem is not in
the information but in the fact that a setting that you think is only checking for an update is under the covers sending info about you. I would want this to be a setting. Even if that setting is defaultted to yes. You should be able to turn that off. Also there should be clear notification that this information is being sent. I am a little surprised that this was not the default position for open source dev.
from
qwarlock
Great comments, Alex. I produce FLOSS too, and I totally understand the tradeoffs involved with trying to gather data about users. I didn’t see mention of download statistics. Is that useful for you? You could host your own download servers, and you could track downloads, IP addresses, and re-downloads over time.
I blogged about this a while back: http://adammonsen.com/post/512 … I am definitely one of those folks angered and surprised at the choice of Terracotta to enable a feature like this in a library. I’d say phoning home is rare for software without an interactive user interface. And even software with an interactive interface generally asks the user, since initiating arbitrary connections is something a user should explicitly approve!
If there were versions of ehcache and quartz that did not phone home, I would certainly prefer using those libraries rather than the versions with UpdateChecker enabled. Since they’re both Apache 2.0-licensed, I suppose anyone could certainly maintain forks with this one-line change for both modules.
@Adam: Terracotta does track downloads and things like that and that’s one piece of the picture. As for the rest, I don’t think I have any comment beyond what I’ve already said. Being OSS, there is of course no reason someone could not do what you suggest.
Proactively, I can think many other less bothersome proxy metrics that could essentially measure the same thing without requiring the library itself to handle a non-library centric task.
1) Check what kind of computers are accessing the API docs online. As this check is only supposed to be on developer machines, it is a reasonable assumption that many of these dev’s might be going online. I bet you could even check what version of Java they might be running by investigating the version of the applet you could instantiate in their browser. At the very least, you’ll always get the operating system and usually whether the machines are 32 bits or 64 bits.
To measure the JVM usage, you probably can just rely on Sun’s published adoption rates of various versions of the JVM, unless you have a specific reason to think that EhCache users are going to be substantially more or less conservative about upgrading. You could recreate these statistics from published OS package downloads for each OS and figure out what the major Java platforms are that people are using.
2) Integrate this check into the build process. When people build EhCache from source (it’s one of the first things I do as a dev to make sure I have some idea how the software works), you could have the build task ask if you’d be willing to supply some anonymous information about the build environment. I bet most devs would say yes.
I think that if you deployed these artifacts via maven, maven sends a lot of this information when it downloads new packages. Then you could just passively comb the logs for the data you want. Then you don’t even have to rely on dev’s building from source. You could get this information by compiling EhCache for Java 1.4, 1.5, and 1.6 and just checking which version was download the most. I think this support is built into maven too. If a dev is using a distributed cache, I think he’s most likely sophisticated enough to know which version of Java he’s using, too, so making multiple versions targeted for different platforms shouldn’t be considered too onerous considering the targeted demographic.
—
A far better business proposition would be to setup an EhCache affiliate program, where you could collect even better data from the very dev’s who are probably most interested in purchasing complementary products. Send out a short survey, which could even include measurements about team size and how it’s being usage. As long as the survey seems relevant and practical, most people will fill it. These would also be the best people to market new products, updates, and support services too.
The Clojure community uses these sorts of surveys to great effect. They measure what IDE’s people are using, what version they’re using, how likely they are to use beta. In my opinion, these measurements could probably inform far more business decisions than a cheap robocall.
I myself would be a little bothered by the making of a new thread. I don’t even like my libraries to spawn threads for me unless they let me pass a threadpool or unless they are very clear about why they are doing it. Think if you were using a math library that shelled out to bc every time you made a call to add.
Just my 2 cents,
Mark
You can turn the update checker off either in the Ehcache xml configuration (with updateCheck=”true”
Sorry about the previous post, pressed enter too quickly.
When you write:
You can turn the update checker off either in the Ehcache xml configuration (with updateCheck=”true”…
You probably mean:
You can turn the update checker off either in the Ehcache xml configuration (with updateCheck=”false”
Cheers,
Yanik
First of all, UpdateChecker is a misleading name for a class that is used for tracking usage. UsageTracker would be more appropriate and honest.
Secondly, this should be an opt-in mechanism. Terracotta should publicise the fact that by turning it on you help make the software better. Lots of successful open-source software use this mechanism.
Thirdly, if it’s not an opt-in mechanism it should be very clear to people that EhCache behaves this way. I don’t want my customer to show me a log of their firewall and ask me some hard questions as to why my software is making connections to some totally unrelated host.
Finally, for what it’s worth, I’ll be wary of Terracotta products from now on and I’ll gladly tell my surrounding to do the same.
I just happened to glance at the Tomcat logs of the Java app I’m architecting and developing and nearly fell off my chair when I saw I was being informed that a new version of ehcache was available.
Honestly, what numpty decided that a *third-party library* ought to be phoning home at *runtime*? I assume this originated in Terracotta marketing since any developer coming up with this idea ought to be shown the door, and certainly wouldn’t be employed by me.
The clients of my application relish security and would be having fits at this unauthorised traffic. We, as a company, would suffer from a lack of trust.
The proper way to gather such information is during the development and release lifecycle.
No ifs, no buts, no excuses.
We are now reviewing our use of ehcache and quartz.
Shameful.
@Peter Cameron: +1000
I work on a large scale product where security and reliability are very important.
Our clients trust is very hard to get, I can’t even begin to imagine their reaction if I hadn’t read this blog completely by accident.
I’m simply speechless that a third party library does that kind of stuff.
We are using terracotta for some years now and growing more and more frustrated with their lack of backward compatibility (took us almost 2 month to get from V3.0.1 to 3.3.0 because spring support was discarded without any warning to paying customer).
Not mentioning the poor documentation (links that work stop working almost from one week to the next).
And now that?!?!?!
Incredible !!!
Utterly lame, with super-lame sprinklings on top.
This has the potential to cause all-kinds of security problems, and trust problems with customers. It could potentially trigger network security systems in some cases.
EhCache and other Terracotta products… never again.
Trust takes years to build but is quickly destroyed.
Ultimately, your explanation boils down to: Terracotta wants to be able to make money off of ehcache.
The honest way to make money from ehcache would be for Terracotta to develop new versions as pay software rather than free software, and, you know, charge money for it. Quietly collecting valuable information – it must be valuable, or it wouldn’t drive business decisions like keeping dozens of people employed – for a supposedly free product is … not so honest.