Thursday, 2 June 2016

Caching means Retention and other Web API quirks

"Fun here becomes related to formal logic and repetition, to the question of where software starts and ends, to mental states, to what operations it can carry out on the world, to the cultures and usages of software, to its building upon itself, to its aesthetics"
Olga Goriunova
For everyone who the other 3 posts in this series were to philosophical, relax, this is a technical one.

Thanks Daniel for suggesting Vinay Sahni's "Best Practices for Designing a Pragmatic RESTful API"; I like that he also emphasizes on documentation, what works in the real world and still believes in some principles like links. Also, Pedro's "List of HTTP API Specs", thanks to Mike for this tweet. And of course the ubiquitous REST Cookbook and the always good Apigee guides (I would reference CA Layer 7 here as well but they hide their guides behind a spamwall), the pragmatic API blog and Steve's rant.

As the 2nd last piece in this series, let's have a look into some technical quirks of Web API's.




What lies behind REST's boundary


Take for instance faceted (by the guy who wrote "Mapping Experiences") search, a way more complex way of interacting with a website which I've used in the past. Instead of a navigation, we used a combination of search, filters and context to allow a more fluid experience. I had mentioned pagination in my first post, how complex this is to get it right, especially when there is lots of concurrent refreshes, sorting, filtering, masking and state changes involved. The same goes for bot interfaces, gestures, or more generally reactive and ubiquitous interfaces. You cannot send the full context, or state of the system, around with every request, you might not even know it or don't want to depend on it. If you look at programmatic content, already dozens of context parameters define an experience. In my opinion, PATCH is already RPC, and while it is needed, it shows this limitation of the original HTTP spec. Users don't think in CRUD (I am aware this is the wrong metaphor for REST), they don't care if the interface is universal (which is both good and bad and I am aware too easy to critique), and certainly not in idempotency, but our API's need to fulfill human needs for the developer too. Yes, there is Method Overriding to push more complex data that are impossible via GET, or data that should not be seen in logs and proxies, but it feels dirty and breaks the idea.

However, REST has it's advantages and a large number of properties that are just generally a good idea, I had mentioned clear domain-driven state transitions on otherwise immutable data, like messages, for instance, or content negotiation and the idea of a universal interface. Or caching. I could write a (short) book about that, but Google and Heroku did a better job at it. Maybe the only hint I can give you is: Free yourself from seeing caching as buffer for performance. Caching forces you to think about state as 1st order architectural citizen, which is great. First and foremost, Caching is a functional information about data quality and retention. There is even the Warning header to give more functional information about data retention! Sometimes you get requirements like "this data must not be cached" - but you cannot prevent someone from displaying a response for as long as they want. If you have such a requirement, you need invalidation or reconciliation, not anti-caching. For whatever data you have, you must define its validity and discuss what that means for the API consumer, and the user experience. If your user experience has different sub-states and data validity rules, you might also need to modularize your code differently and serve it differently.

Also, ETags can have a nice use as state indicator, irrespective of caching for performance reasons - you can have a distributed cache (Redis, Cassandra, Hazelcast) that links UID's for representations to their current resource state. You can even push that further and use ElasticSearch as your primary representation store in a potentially inconsistent CQRS model to be able to query it more flexible, and sync it back to a transactional document store, such as CockroachDB or CouchDB using a solid queue. Or, just use RethinkDB or Firebase if you fancy that kind of magic.

Some good practices from my experience


Much can be said about status codes, there is never enough, but here is a few tricks that I found handy over time:
  • Understand 200 just means OK - it just means the server has understood what you said (It's like a "Yes" in Asia ;) )
  • Use a 202 status code and Location header for asynchronous processing - you may even think about chunked responses or ranges (206, see below). The good thing is, the client can send an Expect to chose it's preferred behaviour (very good for occasionally connected applications).
  • All HTTP responses can have a payload - make use of proper error format e.g. a common JSON error response format (interestingly, JSON Schema Validation does not come with one) in order to distinguish between application and infrastructure errors such as 404, and make sure they are included in your content negotiation. Some examples are a 2013 IETF Draft and vnd.error which is HAL-based.  But as a consumer, make sure you use some quick checks whether the response is in that format, otherwise you end up parsing 500's with ultra-long stacktraces...
  • Some error codes seem too special in the beginning but become really important when you think about it, my favourites are 409 (Conflict) and whatever is used by CloudFlare in their nice machine-readable error format for connection problems

I mentioned Method Overriding above as an example of HTTP headers; a few other really helpful headers are:
  • Understand the power of Content Negotation and Encrpytion headers for performance (zipping) and internationalization, not only for REST resource descriptions and data types
  • Learn the differences between CORS and CSP - the latter might solve your problem better - and how the Forwarded header can help you moving such information closer to your service (yes, RFC 2736 standardized it, welcome to the future!)
  • Understand why the Authorization header should be used and not others, and never GET, to store tokens such as a JWT Token and why you might not only want to use Cookies (mainly because too many legacy servers link them to session state which requires sticky sessions and is not RESTful)
  • The Pinning Headers from RFC 7469 which enhance your client's privacy (in addition to your proper configuration of SSL parameters like ciphers and Cookies)
  • The X-Request-ID semi-standard which is really helpful as a correlation ID to pass across layers, helpful for debugging and performance monitoring
  • As mentioned above, the Location header is very helpful for asynchronous processing or streaming responses (chunked data) but should also always be returned from PUT, POST or PATCH (Hypermedia, the good parts). Check Alt-Svc as a cheap way of geographic optimization or legacy integration (e.g. URL versioning, I always thought it's a bit of pity it does not come with a qualifier e.g. for different SLA's or versions)
  • And yes, even though I despised it before, the Link header and especially the Warning header might be helpful (see below), but check first if a Range or asynchronous request might not make more sense (for pagination)
Same goes for verbs, and the general flow. I have mentioned hyperlinks are good to optimise for Flow inside of one context, but not for streams and events across contexts. There is a few ways to change this:
  • If you are not sure what I meant above with REST is not CRUD, read again on POST vs. PUT
  • Also, I have mentioned the PATCH method. There has been some discussion around it recently, when it finally became standard with HTTP/2, but the core is to understand that for updating you should still use PUT, but you can use PATCH with instructions how to change which can be considered a message/action/intent in the redux/dataflow sense
  • For me, PATCH is the natural counterpart of SSE and Websockets, where they would be a UI change event stream (the problem here is state ordering/reconsiliation, as mentioned above)
  • I think HEAD requests are wildly underused, especially if you use headers and caching a lot. Take a look and consider using them for instance for eager fetching (better user experience) of potentially cached resources or to get range, link, location and alternative service options (in a 300 return on a high-level resource)
  • Speaking of, the OPTIONS request is even more obscure, but in my opinion even more useful. If you use nice Link headers it might very well describe all hyperlinks that make sense (as you know, I don't think that's many) in order to get rid of formats such JSON API (a.k.a. SOAP over JSON), allowing to properly separate REST constraints from content types
And that's it really. What a refreshingly technical post.

No comments: