Recommendations when publishing a WASM library

40 points by comagoosie 4 years ago · 21 comments (20 loaded)

Reader

jitl 4 years ago

Wow, this is a great resource! I’ve been dealing with a lot of these issues with a JS sandbox library I’m building quickjs-emecripten. I want to offer a ton of different build variants of my core WASM file. Here’s what I tried: https://github.com/justjake/quickjs-emscripten/blob/master/t...

This approach works great for NodeJS, but once I ran a test bundle I found that Webpack (and bundlephobia) included all the base64 “release” variants instead of lazy-loading the import statements. Bummer. I assumed this because Typescript on its own compiled import to Promise.resolve(require(…)), so it’s good to know that most bundlers will STILL get this wrong even if I’m emitting ES6 module import syntax. Yikes! I need to bite the bullet and start using Rollup to emit a slew of separate entry points. Oy veh.

Anyways A+++ would read again. This will save me 4-5 days of work stubbing my toe on bundlers and build system which is the Least Fun part of JS work.

Shadonototra 4 years ago

Security wise, it is not a good idea to consume WASM libraries "as is", ask for the source, read it, and compile it yourself

You don't want to be in a position to ship code to production with binary code that could potentially be harmful

Off topic: Please don't mess up the way my browser scroll pages, it is infuriating

aseipp 4 years ago

In the case of NPM consumers of wasm libraries, it often isn't realistic option "as-is", since they won't have the toolchain needed to build the code. Rust is a lot more well oiled than C/C++ in this regard but it's a bit of a hassle to line things up and keep things reliable and reproducible. (Good luck if the blob is not only C/C++ but uses 3rd party dependencies on top of that, which often require even more hurdles). If you don't use the pre-built blob you'll often have to 'insert' it otherwise into the library, somehow, which is its own chore. So it's all a bit chicken and egg at some level. Now, it's not like a lot of these packages ever followed best practices in this regard (I'm reminded of many Ruby, Python, JS packages that love to ship random .so files, and often do it incorrectly) but it is what it is.
That said I generally agree with the premise, and even with sandboxing you should vet dependencies like these where appropriate if you can. A good example of this is something like an image decoder versus a database library (both of these being real scenarios; e.g. using a pure-Rust implementation of some SQL protocol.) The first one I probably wouldn't worry too about much, you're just giving it pixels in and getting pixels out. But the second one is likely worth a bit of scrutiny since it interfaces directly with a sensitive component.
dgb23 4 years ago

Can you elaborate on that? From what I know, WASM is a safe, rather abstract bytecode format and has far less API capabilites as JS has (which is why you need to call it from JS to affect the browser).
- johannes1234321 4 years ago
  
  The concern is the same as with any dependency: The dependency runs under your privileges with access to your data. A malice vendor could do "anything" at least within the scope of your application.
  For instance if you create a web mail application the code probably has access to all mails, can delete them, can send mail under the user's identity, ...
  How relevant those scenarios are you have to evaluate.
  If you compile yourself, you can verify the source to increase trust. If you just get the binary, you have to trust the vendor more.
  - dgb23 4 years ago
    
    But again, it's not "just" a binary. You provide the interfaces to it that actually affect the world. I agree with the general implications of using a dependency though.
    
    gwbas1c 4 years ago
    
    At least in in-browser Blazor (C# compiled to WASM) I have full access to the page's Javascript environment: I can call most Javascript methods available, and I can even call eval.
    I'm still new to WASM, but I assume this functionality is part of the WASM runtime? (As opposed to a hook that's part of the Javascript part of Blazor?)
    
    jitl 4 years ago
    
    You are assuming wrong, this is Blazor’s JS support library. By “default” WebAssembly has nothing outside the WASM bytecode, not even memory/allocator. It’s up to the host environment to provide the WASM module a Memory and whatever “imports” it needs.
    
    klibertp 4 years ago
    
    Memory can also be statically reserved in a binary, like the .data section in x86 asm. IIRC (tell me if I'm very wrong) Emscripten malloc works by reserving a large buffer up front and then managing pointers to (parts of) it. Otherwise, you can allocate memory buffer dynamically from JS[1] and pass the pointer to it to the WASM. Obviously, you can do this in response to the request coming from WASM, but you don't have to.
    [1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
    
    johannes1234321 4 years ago
    
    It is true, that you have to expose things first. However as soon as any non-trivial object is shared from JS to WASM it is likely that some reference to something global, to the DOM, ... is nested in there. And that makes all other attempts to limit access void. (DOM give networking etc )
    
    klibertp 4 years ago
    
    I think you actually can't pass any non-trivial object to WASM. The only thing you can pass to and get out of WASM (IIRC) is a chunk of linear memory, a buffer. I don't think it's possible to obtain a raw pointer to an object on a JS side - passing that would be very unsafe, but also pretty much useless, unless you knew precisely the memory layout of said object. If you need to expose operations on a more complex objects to WASM, you need to encode the object identity somehow (as a number, or a buffer) and expose an API that will decode the object reference and call the needed function/method on it.
    
    johannes1234321 4 years ago
    
    When using emscripten you get the full marshalling of objects.
    Doing a network request from C++:
    val xhr = val::global("XMLHttpRequest").new_(); xhr.call<void>("open", std::string("GET"), std::string("http://url"));
    https://emscripten.org/docs/api_reference/val.h.html#val-h
    Of course this can be thrown out of the JavaScript binding side so that it isn't there anymore, but the marshalling makes the API nice and so many emscripten users will use it, thus it's present, thus you have to trust the wasm library.
    How complicated this is without emscripten i don't know, but even then I guess many people will need some marshalling for real life scenarios.
    
    klibertp 4 years ago
    
    As I said, you need to encode object identity (here - as a string to look up in the global namespace) and handle requests for doing things with that object (encoded as a method name + a list of strings as arguments) on the JS side. Strings are probably encoded as pointer + length, and the JS side takes care of locating the object, decoding arguments into JS objects, performing the action, and returning the result, encapsulated in a val class instance.
    But, you're right, I forgot about this part of emscripten, my bad :) My use case was passing a 2Mb of image data to WASM, which was simple to do with just WebAssembly.Memory, so I didn't get the chance to use this part of the FFI (I used the part going in the opposite direction[1]). I don't know the details of the val implementation, nor the details of JS-side handling the calls, but the basic principle should be as I said: the only "things" you can pass between JS and WASM are numbers and linear buffers. To do anything above that you need support on the JS side and some kind of encoding/serialization, similarly to what you do in IPC/RPC. EDIT: I also suspect it's possible to exclude/disable val.h with a define flag (I didn't check this though).
    [1] https://emscripten.org/docs/api_reference/bind.h.html#_CPPv4...
  - jitl 4 years ago
    
    The great thing about WASM is that you don’t need to audit the binary - just the code that touches the binary through the WebAssembly.* namespace. If the code looks too complicated, or exposes eval or equivalent capabilities like arbitrary JS function calls, then you should approach with caution and build yourself, etc etc.
    Most WASM libraries I’ve considered using (and the one I package myself) use an off-the-shelf Emscripten wrapper minified with Google Closure Compiler. This is annoying to audit compared to plain JS, but certainly doable with a few rename-symbol in your editor.

modeless 4 years ago

I really hate that base64 is the best option for embedding inline assets in HTML/JS files. Have there been any proposals to add real heredoc support to HTML/JS? With the ability to choose delimiters and support for unrestricted binary data?

kylebarron 4 years ago

Looks to be a great resource. I've been working on a WASM implementation of reading and writing Apache Parquet [0] and it's been difficult being new to WASM to find the best way of distributing the WASM that works on Node and through bundlers like Webpack.

[0]: https://github.com/kylebarron/parquet-wasm

westurner 4 years ago

conda: "Adding a WebAssembly platform" https://github.com/conda/conda/issues/7619

  pyodide.loadPackage("numpy");
 
  pyodide.loadPackage("https://foo/bar/numpy.js");
  

  #  import micropip
 
 micropip.install('https://example.com/files/snowballstemmer-2.0.0-py2.py3-none-any.whl')

"Creating a Pyodide package" > "2. Creating the meta.yaml file" https://pyodide.org/en/stable/development/new-packages.html#...

conda-forge: "WASM as a supported architecture" https://github.com/conda-forge/conda-forge.github.io/issues/...

kansface 4 years ago

Does anyone have practical experience running WASM in long running processes on node, as compared to something like Neon? Writing C in TS doesn't look particularly appealing to me, but node extensions bring their own set of problems.

whb07 4 years ago

Write it in Rust.

Settings

Recommendations when publishing a WASM library

Keyboard Shortcuts