We have here a Python API for fetching data from Mongo and either returning the raw JSON or a formatted, “parsed,” one. There is certainly a number of things that could be improved (it has been written by a non-programmer and no Python expert, so it is actually a real achievement for him) but what I want to focus on is the API exposed to the clients.
It troubles me because the API exposes too many details about its inner workings and forces the clients to know them.
Here is the code:
| … | |
| # The client code | |
| @my.route('/cleanjson/<collection>/') | |
| def getcleanjson(collection): | |
| cleandata = Cleanjson() | |
| cleandata.get_mongojson(str(collection), "?s={'_id': 1}") | |
| cleandata.parse_json(cleandata.rawjson) | |
| return cleandata.cleanjson |
| # The API code | |
| … | |
| class Cleanjson: | |
| … | |
| def __init__(self): | |
| self.cleanjson = [] | |
| … | |
| def get_mongojson(self, collection, query): | |
| self._make_requesturl(collection, query) | |
| alldata = requests.get(self.request_url).json() | |
| self.rawjson = alldata['rows'] | |
| def parse_json(self, datalist): | |
| cleaned = [ { "date": datalist[i]['timestamp'], … } | |
| for i in range(len(datalist)) ] | |
| self.cleanjson = json.dumps(cleaned) | |
| return self.cleanjson |
Disadvantages It will be hard to change the internal organization of Clearjson once it’s widely used and the clients need to know and do too much, while the only thing they really care about is getting data, either raw or pretty formatted. And there is an invisible temporal coupling: the three methods/fields must be called in this particular order.
This is what I would have preferred (though, generally speaking, still far from being perfect):
| … | |
| # The client code | |
| @my.route('/cleanjson/<collection>/') | |
| def getcleanjson(collection): | |
| return Cleanjson().get_mongojson(str(collection), "?s={'_id': 1}", format=True) |
| # The API code | |
| … | |
| class Cleanjson: | |
| # No self.cleanjson attribute anymore | |
| def get_mongojson(self, collection, query, format=False): | |
| request_url = self._make_requesturl(collection, query) | |
| alldata = requests.get(request_url).json() | |
| rawjson = alldata['rows'] | |
| if format: | |
| return __parse_json(rawjson) | |
| else: | |
| return rawjson | |
| def __parse_json(self, datalist): | |
| cleaned = [ { "date": datalist[i]['timestamp'], … } | |
| for i in range(len(datalist)) ] | |
| return json.dumps(cleaned) |
Improvements The code now corresponds to what the client actually wants (getting data, with some variations such as formatting), places much less cognitive load on the client since there is only one method to call, something you can hardly do wrong. The API is simpler to understand, simpler to use, less bug-prone. And we are free to evolve and re-arrange the internals of Clearjson however we see fit, as long as we keep the signature of the single method.
Objections Somebody might dislike the increased complexity of get_mongojson that now has three parameters (Clean Code recommends as few parameters as possible, three being max) but I think it is a small price for the overall improvement, especially given that there is a reasonable default value and that we can use named arguments.