So I have never really played with pitch f/x data and was wondering where the raw data could be found. I wanted to mess around with a few ideas for data visualization and want a place where I can get all of the pitch f/x data in a few files. Does this exist? I'm sure this has been asked before but I couldn't find it in the first few pages of the search on the new content on this site.
Announcement
Collapse
No announcement yet.
Announcement
Collapse
No announcement yet.
pitch f/x
Collapse
X
-
It depends on what you want to do. If you want to look at specific players, JoeLefkowitz.com will allow you to download the data in Excel/CSV format. If you want all the PITCHf/x data, the 2007-2009 data is available to download from Darrell and Jeff Zimmerman's site here:
I can help you with instructions for scraping the data from the MLB.com website if the above is not enough for you."Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"
Comment
-
Thanks guys. Do you have instructions for scraping already laid out in a word doc or something? I will look at what you guys have sent, but I suspect in the future I may want to write some scripts to automate scraping the newest data from it's source and keep my db up to date.
Comment
-
Originally posted by willgladst View PostThanks guys. Do you have instructions for scraping already laid out in a word doc or something? I will look at what you guys have sent, but I suspect in the future I may want to write some scripts to automate scraping the newest data from it's source and keep my db up to date.
Note: links to updated versions of scripts and database structure are found at the end of this post. Also, I highly recommend using XAMPP (or Mac equivalent) to install all this software with one e…
But if you do that, use XAMPP to greatly simplify the installation process (assuming you're on Windows or Linux), and get the updated scripts that I link at the end of the post.
My scripts are in Perl, heavily adapted from what Joseph Adler did in Baseball Hacks.
An alternative is to use the Baseball on a Stick scripts, which were written in Python by Kyle Wilkomm and others:
Download Baseball On A Stick for free. This project is intended to provide code to be used with MySQL and Python to create a database of major league baseball game events which are freely provided by the mlb.com Gameday application. Older version also support creating a retrosheet.org database but that is no longer supported."Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"
Comment
-
Yeah, Darrel and Jeff Zimmerman's site looks great. They even have some scripts set up for loading which is perfect. But I am also interested in keeping my db 'current' and not having to wait on other folks to post their changes, but that is in the future and this is great for now. KS, if you have anything already available I will gladly have a look. Otherwise I may bug you down the road as I get a little more comfortable with all of this.
Comment
-
Originally posted by willgladst View PostYeah, Darrel and Jeff Zimmerman's site looks great. They even have some scripts set up for loading which is perfect. But I am also interested in keeping my db 'current' and not having to wait on other folks to post their changes, but that is in the future and this is great for now. KS, if you have anything already available I will gladly have a look. Otherwise I may bug you down the road as I get a little more comfortable with all of this."Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"
Comment
-
KS, in terms of setting up from the start, would the steps you recommend be:
1) Use your initial scraper, dbparse, and pitch update scripts to do the initial population of your db structure. Then run the Zimmerman scraper scripts to maintain?
2) Use your initial scraper, dbparse, and pitch update scripts to do the initial population of your db structure and maintain using the scraper scripts you wrote.
3) Or do the Zimmerman scripts have all the changes you made in 2011 and I should grab the 400+ meg mysql dump and maintain from their scripts. I notice they are grabbing some batter and pitcher info which it seems like your scraper script does not have but I don't see them referenced in the xml import script.
Doing a diff between your files, I don't see much difference except the pitcher and batter data pull by the Zimmermans. So my instinct is to take their route as they seem to have a version that is up to date with yours and can scrape everyday with decent speed?
Comment
-
I made some substantial speed improvements from my original versions of the scripts. I'm not sure how the Zimmerman scripts differ from mine. If I understand which batter and pitcher info you are talking about, i.e., the pbp/batter and pbp/pitcher files, MLBAM has discontinued providing that information (which was duplicate information anyhow)."Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"
Comment
Comment