Can someone build Roam x Mathematica?

Can I ask for help/advice? 

I really want a @Roam x @Mathematica tool for making explicit, data-filled knowledge graphs of new fields. What’s a good way to hack a pipeline until someone builds this?


I often want to understand new fields. To do this, I need to both build an idea graph and compile evidence for nodes in the graph. 

How I learn new fields:

  1. Build an idea graph (Roam, Google Sheets)
  2. Capture key papers + databases for nodes on the graph (local folders, Google Drive)
  3. Extract information from key papers + databases (Google Sheets, export to pandas or other programming)
  4. Graph extracted information (Prism, Matplotlib, Google Sheets charts, Jupyter or Mathematica notebook)


It’s hard to link idea graph and primary evidence in a way that 

  1. Renders the primary source easily accessible
  2. Allows for fluid data manipulation
  3. Allows for easy data visualization
  4. (Wishlist) propagates uncertainty in one concept to dependent nodes (make Roam an actual PGM, or make PGMs more interpretable and editable)

There a few reasons for this

  1. PDFs are a terrible, noisy filter for the raw data (see below)
  2. Solutions exist for 2+3 above (for example, Mathematica or Jupiter notebooks) but PDFs are such a terrible core source of information that it’s hard to easily add them to this stack
  3. Science is nuanced and any time you put things in spreadsheets it loses some important context. Papers capture some of this in an unstructured way. 

My core issue with reason #3 is that you should be able to then add structure to your idea graph, and incorporate the primary evidence into the updated graph with notes on your uncertainties. This to me is the recursive cycle at the heart of this process. I think a more fluid tool would allow for many more layers of iteration here. 

Appendix: Why I hate papers so much as a way to get scientific info

How science works: 

  1. Collect data, put in spreadsheet (scientist)
  2. Make JPEG with spreadsheet  (scientist))
  3. Put JPEG in PDF (scientist))
  4. Extract JPEG from PDF (you)
  5. Extract data points from JPEG (you)
  6. Put data points in spreadsheet (you)

The paper PDF is literally a noisy, compressed filter on the info you want

12 responses
There have been some attempts to build algorithms that mine biomedical papers full of unstructured data, measure the strength of the relationships between the papers and graphing hypothesis in aggregate. I believe IBM had tried this a few years back as well but couldn't find links to the example.
You might be interested in what we're building at scite ( scite is a new platform introducing "Smart Citations"– citations that allow users to see how a scientific paper has been cited by providing the context of the citation and a classification describing whether it provides supporting or disputing evidence for the cited claim. We're announcing visualizations next week but you can explore on here already:
10 visitors upvoted this post.