Settings

Theme

Ask HN: How to return large response over an API?

2 points by cloverr20 5 years ago · 9 comments · 1 min read


I am using a third party service which would transcribe an audio file and give me the entire transcription in a json format, I would read this and make a new json file by transforming it and send to s3. I am using python/django for it along with json, requests and boto3 module, however I noticed as the application kept running the memory consumption kept increasing.

To fix this I had to write it as a streaming response to a json file and use ijson (https://pypi.org/project/ijson/) to read the content which decreased memory utilisation a lot. So while looking around, I found many people had the same issue, so this got me thinking how would I have sent this content if i had built the third party api myself.

Some previous questions asked on stackoverflow,

https://stackoverflow.com/questions/2400643/is-there-a-memory-efficient-and-fast-way-to-load-big-json-files-in-python

https://stackoverflow.com/questions/11057712/huge-memory-usage-of-pythons-json-module

How do you send a response via an api for very large content, along with their advantages/disadvantages.

aardvarkr 5 years ago

Sounds like this is a personal problem and not a them problem. They’re sending you a json and you’re complaining that your memory usage is exploding. That’s on you. If you’re not actively cleaning up after yourself idk how you can blame the external api for your internal memory issues. That’s my two cents at least. You can’t possibly design an api that fixes a user’s poor design.

  • cloverr20OP 5 years ago

    Hi, I have updated it with some additional stackoverflow links, you can see many people have struggled with similar issues over time.

    • aardvarkr 5 years ago

      Bad design is endemic. Just because other people made the same mistake as you doesn’t mean it’s the only option.

      Think about it. They’re sending you data. You’re processing that data. If you don’t delete that data then it’s going to persist especially if you don’t do anything about it. Here’s a suggestion - do something about it! Clear unnecessary data immediately after it’s not needed so you don’t have to rely on python/django’s crappy garbage collection.

tiew9Vii 5 years ago

If you can chunk the response json-seq/xml (sax parser) may be worth looking at.

The server can incrementally stream a chunk and client incrementally consume a chunk keeping flat memory usage.

I think gRpc also supports streaming but don’t know much on it.

JSON is a bad format for large files as generally you need to read the entire file in to memory before you can use it as you observed.

  • cloverr20OP 5 years ago

    This is the exact problem I have been referring to. Parsing json without loading it entirely is difficult, will look into the xml (sax parser). Can the yaml format be considered for a streaming response approach?

pestatije 5 years ago

Json is the default go-to standard for web services nowadays. But it doesn't mean it is the best format for your requirements. I'm not sure what an audio transcription is exactly, but if you can "stream" it you don't need json at all. Just use some basic serializer and stream that instead.

  • cloverr20OP 5 years ago

    Audio transcription would mean speech to text. Once I get the json I need to do some modifications on it and store it in cloud as a json file. However to do that, I need to load it entirely in memory, make my changes and then write it to file.

    The problem here I am facing is, when the object is loaded a lot of memory is used and the memory used is not freed fully, so the difference in this accumulates gradually. So I was looking for formats which can be processed without keeping the entire text in memory.

    • pestatije 5 years ago

      Right, i see two different problems here: json lib apparently leaking, and deciding best format for info transfer.

      For the first one: look in specific json lib forums, ask there, check open issues, open issue. (all those stackoverflow questions/answers are really confusers, not helpers).

      For the second one: so it's text, i'd transfer text then. It's too much text? Process it in chunks. Your final format is json? Encapsulate your text as json at the last final step.

PEJOE 5 years ago

If you know the structure of the file so that you are comfortable reading it a bit at a time, and you’re confident the structure won’t change, C has several ways to control how much of the file you read in. You could just write a python module to handle this situation.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection