Ask HN: What is the best way to calculate percentile of streaming data?
Hello,
I need to code python function which will iterate through incoming requests and calculate percentile of size of body dynamically. Which lib or algo do you guy recommend?
Example: Requests are coming in batches. Let's say first batch has 50 requests, next one has 80 etc. I need to calculate percentile of size of body that each request has. I think you need to provide a bit more info, are you using Apache Kafka? Something else? The function would be individual batch requests divided by total requests multiplied by 100, but I dont think thats what your looking for. Edit: actually, for your question it would be the inverse of batch size multiplied by 100, eg. First batch has 50 request so that would be 1/50×100 or 2% No I am not using Kafka. It is just basic python server. I want to calculate what is the nth percentile of size of my incoming request object over past one hour. I don't want to store size of each request in memory. It will eat so much of my RAM. Incoming traffic: - 1st batch --> 60 requests ---> size of 1st request is 10kb ---> size of 2nd request is 2kb ... ... - 2nd batch --> 10 requests ---> size of 1st request is 5kb ---> size of 2nd request is 8kb ... - 100th batch I am talking about percentile(10th, 50th, 95th) size of request.