Ross P. Sullivan Kelly I’m a data scientist, I was a chemist for a while before that. I love (and do most of my work in) python, although lately more SQL. My pronouns are he/him. If you want to search, be my guest!

Loading…

Check it out! I’ve been wanting to do something with the space weather APIs NASA has a wealth of awesome APIs, and it really makes you appreciate all the work they do. Also the names they choose. So. Great. I mean, check out this little dude:
for a while, and it dovetailed nicely with some of my new fave tools, duckdb and great_tables.

DuckDB

It’s basically a new, shiny, SQLite. I think there are really meaningful distinctions that I, uh, don’t care to get into. It’s completely embedded, which is great because I never want to manage a database (narrator: that’ll probably be his next blog post). Before actually playing around with it, I’d mostly heard of duckdb in the context of running ad hoc queries against flat text files directly in the terminal. Which is super cool, but I have no desire to do that. What I do want to do is look at NASA data. Which is, amazingly, this easy:

        
import duckdb
that_sweet_cme_data = duckdb.sql(
            'https://api.nasa.gov/DONKI/CME?api_key=DEMO_KEY').df()
        
        

That’s it. It’s wild. Ok, sure, it’s in the category of libraries that abstract so much away that who knows what’s going on. That sort of nonsense is what makes python a delight, and I’m here for it. But honestly, what I’ve found super cool is some of the new query language features described here, here, and here. My faves? Reusable column aliases (think select 21 as a_number, a_number * 2 as the_number and unnest, which explodes structs into new columns. It’s a joy. As opposed to bigquery’s unnest. Which only unnests disappointment.

Great Tables

It’s a python library that makes nicely formatted tables. ‘Cause know what’s really hard? Making tables actually nice. To the point that there’s a whole generation of data scientists (it’s me. I’m the generation) who are so drunk on fancy datavis tooling that we forget that the best way to convey information is ordered text and numbers on a grid. Thanks to the finest PM & UX person I know, Phillip Miller, for helping me realize that.

Anyhoo, what makes great tables nifty Another wicked logo.
is not all the actual good stuff about how it builds and formats tables elegantly (which it does), or that it does so on notebooks and websites consistently (it does that too), but the nanoplots. These are sparklines. That fit into a cell on the table. And just work. Sparklines are one of the absolute best ways to illuminate information without totally going overboard into graph-land. They give you just enough information to see trends but no more, and it’s great. They’re the “this meeting should have been an email” of line charts. A cousin, to my mind, is this remarkable font, chartwell that does some sort of ligature trickery to just let you type numbers and turn them into charts. If it weren’t so expensive I would buy it. All the more wonderful is how you specify the data, just have a space-delimited string of numbers and there you have it. Like magic. In this case I’m actually using a 2D plot, because I wanted a sense of frequency over time. And it works fantastically. And, I learned something! Take a look at any ER CMEs in the table; they come in twos; that’s because there are multiple components of the same events. I wouldn’t have looked that up if I hadn’t seen the plots. So cool.

So much chit chat about libraries and I didn’t even talk about the sun stuff! Next time!

Loading…