Biomedical knowledge graph-backed service: seeking collaborators
Hi HN. Please find herein a brief description of a project I've been working on and why I'm posting.
Problem space: Improve the efficiency with which researchers / students can consume and utilise information from biomedical publications.
Background and use cases: During my time in the biomedical industry, I have been exposed to several cases where the class of tool I'm envisaging would be productive, but have not been able to find a satisfactory solution. Some (abstract) cases: - What are the consequences of modulating a certain biological entity? - What biological changes could cause a certain consequence? - What are the relational paths between process / chemical / outcome A and B, inferrable from published findings?
Solution is a system which: - Automatically compiles relationships between biomedical concepts from publications. - Models as a knowledge graph, allowing traversal queries to form chains of relations (inference). - Is queryable through an interface / API, optimised for the above cases.
Status: I have a relation extraction system that is fit for prototype (built in Python on top of SpaCy) and have experimented with knowledge graph implementation (in GraKn) and am encouraged to continue. Achieved in 3-4 months of part-time effort (relying on previous experience of "text mining" activites on scientific content).
I would like to: - Discuss with anyone interested. - Find potential collaborators for a proof-of-capability system demonstrate to some labs within the next 2-3 months. - Be challenged on the thinking and made aware of related projects (if it exists, sign me up).
Desirable traits in collaborator: - Someone to complement my skills. I.e, foremost they would have strength in front-end development / design. - Scientific background is a plus.
Don't hestitate to hit me up for a chat and more information. Cheers! This is a big area with many active researchers. Your problem definition / solution seems a bit fuzzy to me. What do you want your system to do that eg. aristo doesn't do? https://allenai.org/aristo/ There are philosophical differences and practical differences. They are trying to build an AI scientist ground-up (love it). I would like to optimise for front-line use cases in a lean, user-driven manner. The common denominator technologically is knowledge extraction (text to subject-predicate-object triples), ontological mapping (e.g, these relationships express the same thing, these references are synonyms of this compound, etc.), and reasoning (comes free once you have information properly extracted and mapped with a schema in place thanks to knowledge graph implementations such as GraKn). Practically speaking: Aristo takes questions in unstructured text, and answers in unstructured text. I'm interested in providing mechanistic queries and comprehensive, highly-structured result sets. For a question such as, "what are the biological consequences of increasing the activity of molecule A", I want tabular and filterable results (where the number of rows depends on the volume of underlying data and the degrees of separation you carry the inference to). For this reason (alongside their current limitation to elementary science) I argue that Aristo is not currently a relevant resource for researchers and students looking to query and survey biomedical relationships. The solution I'm aiming at takes a structured query and returns structured results. E.g, a query: [entity: molecule A, direction: increase] generates a list of direct and inferred consequences. It is more like a logic-driven search engine over structured information than it is a question answering system.