Posts by Tags

datascience

Scraping Reddit, part 2

8 minute read

Published:

The last post dealt with using pushshift and handling requests to access posts and comments from Reddit. This post deals with using the Python Reddit API wrapper to accces posts and comments from Reddit and then using some NLP tools for some basic sentiment analysis.

Scraping Reddit, part 1

10 minute read

Published:

In light of recent internet trends about retail investors, I’m sure many of us have questions about the kinds of content that gets posted on reddit, and if there are home-grown, analytical ways of addressing these questions. I’ll be showing two ways of parsing submissions and comments to Reddit, this one focusing on using pushshift API endpoints using the requests library, some custom classes for processing these responses, and asyncio to handle asynchronous threading for multiple requests to pushshift.

Poetry and Docker

6 minute read

Published:

What is poetry and where does this fit in the python software/DS ecosystem? And some beginner forays into docker.

Downloading and studying my message behavior

5 minute read

Published:

Digital privacy is everywhere, and recent laws are pushing companies to disclose whatever personal information they may have on you. In the spirit of science, I’m going to make myself my own study subject and observe what Facebook has stored from my messenger history. Along the way, I’ll do some recursion, a little parallelization, some generators for data procesing, and basic visualization to observe my messenger behavior. Notebooks can be found here, but this one you can’t reproduce because I won’t be providing my messenger data (try this notebook on your own messenger data if you’re curious).

NBA defensive schemes

7 minute read

Published:

Team defensive schemes

Does a team’s defensive scheme influence opponents’ shot portfolios? I’m going to be using a different NBA API to query all the games for the 2018-2019 regular season. In each game, I’m going to log each “make” against a team’s defense. For example, Houston makes a 22-ft 3-point shot against Boston, I will log the distance at which Houston scored against Boston.

gradSchool

Technical Debt

6 minute read

Published:

Technical debt is an ongoing, ever-pressing issue to any large, collaborative code base. If left unaddressed, technical debt can seriously cripple productivity.

Introduction

1 minute read

Published:

Hello world

This is my first post. I’m Alex. I’m from the Northern Virginia area. I like chemical engineering, chemistry, computer science, and scientific computing/data science.

molecularmodeling

Molecular Modeling 2

12 minute read

Published:

Conducting a simulation

Running a simulation means taking a model and sampling sort of distribution with it. Example simulation

Molecular Modeling Software: MDTraj

7 minute read

Published:

Some anecdotes with analyzing simulations

Let’s say you’ve conducted a simulation. Everything up to that point (parametrization, initialization, actually running the simulation) will be assumed and probably discussed another day. What you have from a simulation is a trajectory (timeseries of coordinates), and now we have to derive some meaningful properties from this trajectory.

Molecular Modeling 1

5 minute read

Published:

How do you model something?

Let’s talk about molecular modeling from both the chemistry and mathematic standpoints. When you want to model something, what do you need?

Introduction

1 minute read

Published:

Hello world

This is my first post. I’m Alex. I’m from the Northern Virginia area. I like chemical engineering, chemistry, computer science, and scientific computing/data science.

personal

Scraping Reddit, part 2

8 minute read

Published:

The last post dealt with using pushshift and handling requests to access posts and comments from Reddit. This post deals with using the Python Reddit API wrapper to accces posts and comments from Reddit and then using some NLP tools for some basic sentiment analysis.

Scraping Reddit, part 1

10 minute read

Published:

In light of recent internet trends about retail investors, I’m sure many of us have questions about the kinds of content that gets posted on reddit, and if there are home-grown, analytical ways of addressing these questions. I’ll be showing two ways of parsing submissions and comments to Reddit, this one focusing on using pushshift API endpoints using the requests library, some custom classes for processing these responses, and asyncio to handle asynchronous threading for multiple requests to pushshift.

Poetry and Docker

6 minute read

Published:

What is poetry and where does this fit in the python software/DS ecosystem? And some beginner forays into docker.

Downloading and studying my message behavior

5 minute read

Published:

Digital privacy is everywhere, and recent laws are pushing companies to disclose whatever personal information they may have on you. In the spirit of science, I’m going to make myself my own study subject and observe what Facebook has stored from my messenger history. Along the way, I’ll do some recursion, a little parallelization, some generators for data procesing, and basic visualization to observe my messenger behavior. Notebooks can be found here, but this one you can’t reproduce because I won’t be providing my messenger data (try this notebook on your own messenger data if you’re curious).

NBA defensive schemes

7 minute read

Published:

Team defensive schemes

Does a team’s defensive scheme influence opponents’ shot portfolios? I’m going to be using a different NBA API to query all the games for the 2018-2019 regular season. In each game, I’m going to log each “make” against a team’s defense. For example, Houston makes a 22-ft 3-point shot against Boston, I will log the distance at which Houston scored against Boston.

Fantasy NBA 2

14 minute read

Published:

Part 2 of evaluating fantasy NBA draft picks - modeling and sampling for expected fantasy output.

Fantasy NBA 1

1 minute read

Published:

Part 1 of evaluating fantasy NBA draft picks - first gathering the relevant data.

Introduction

1 minute read

Published:

Hello world

This is my first post. I’m Alex. I’m from the Northern Virginia area. I like chemical engineering, chemistry, computer science, and scientific computing/data science.

scientificComputing

Bayesian Methods 1

11 minute read

Published:

First-attempt at using PyMC3 for Bayesian parameter estimation

Applying some principles from earlier mcmc posts/notebooks to estimate the parameters of a linear model

Molecular Modeling 2

12 minute read

Published:

Conducting a simulation

Running a simulation means taking a model and sampling sort of distribution with it. Example simulation

Technical Debt

6 minute read

Published:

Technical debt is an ongoing, ever-pressing issue to any large, collaborative code base. If left unaddressed, technical debt can seriously cripple productivity.

Molecular Modeling Software: MDTraj

7 minute read

Published:

Some anecdotes with analyzing simulations

Let’s say you’ve conducted a simulation. Everything up to that point (parametrization, initialization, actually running the simulation) will be assumed and probably discussed another day. What you have from a simulation is a trajectory (timeseries of coordinates), and now we have to derive some meaningful properties from this trajectory.

Molecular Modeling 1

5 minute read

Published:

How do you model something?

Let’s talk about molecular modeling from both the chemistry and mathematic standpoints. When you want to model something, what do you need?

Introduction

1 minute read

Published:

Hello world

This is my first post. I’m Alex. I’m from the Northern Virginia area. I like chemical engineering, chemistry, computer science, and scientific computing/data science.