Announcement

Collapse
No announcement yet.

Announcement

Collapse
No announcement yet.

pitch f/x

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pitch f/x

    So I have never really played with pitch f/x data and was wondering where the raw data could be found. I wanted to mess around with a few ideas for data visualization and want a place where I can get all of the pitch f/x data in a few files. Does this exist? I'm sure this has been asked before but I couldn't find it in the first few pages of the search on the new content on this site.

  • #2
    You could try http://www.brooksbaseball.net/

    Comment


    • #3
      It depends on what you want to do. If you want to look at specific players, JoeLefkowitz.com will allow you to download the data in Excel/CSV format. If you want all the PITCHf/x data, the 2007-2009 data is available to download from Darrell and Jeff Zimmerman's site here:


      I can help you with instructions for scraping the data from the MLB.com website if the above is not enough for you.
      "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

      Comment


      • #4
        Thanks guys. Do you have instructions for scraping already laid out in a word doc or something? I will look at what you guys have sent, but I suspect in the future I may want to write some scripts to automate scraping the newest data from it's source and keep my db up to date.

        Comment


        • #5
          Originally posted by willgladst View Post
          Thanks guys. Do you have instructions for scraping already laid out in a word doc or something? I will look at what you guys have sent, but I suspect in the future I may want to write some scripts to automate scraping the newest data from it's source and keep my db up to date.
          You can read the instructions I wrote for it back in 2007 on my old blog, here:
          Note: links to updated versions of scripts and database structure are found at the end of this post. Also, I highly recommend using XAMPP (or Mac equivalent) to install all this software with one e…


          But if you do that, use XAMPP to greatly simplify the installation process (assuming you're on Windows or Linux), and get the updated scripts that I link at the end of the post.

          My scripts are in Perl, heavily adapted from what Joseph Adler did in Baseball Hacks.

          An alternative is to use the Baseball on a Stick scripts, which were written in Python by Kyle Wilkomm and others:
          Download Baseball On A Stick for free. This project is intended to provide code to be used with MySQL and Python to create a database of major league baseball game events which are freely provided by the mlb.com Gameday application. Older version also support creating a retrosheet.org database but that is no longer supported.
          "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

          Comment


          • #6
            Yeah, Darrel and Jeff Zimmerman's site looks great. They even have some scripts set up for loading which is perfect. But I am also interested in keeping my db 'current' and not having to wait on other folks to post their changes, but that is in the future and this is great for now. KS, if you have anything already available I will gladly have a look. Otherwise I may bug you down the road as I get a little more comfortable with all of this.

            Comment


            • #7
              Oh, awesome. This is even better that first glance, I just looked quickly before but Darrel and Jeff's site also has scraper scripts. Thanks so much KS. I think I can figure out what they are doing by going through their perl scripts without having to trouble you.

              Comment


              • #8
                Originally posted by willgladst View Post
                Yeah, Darrel and Jeff Zimmerman's site looks great. They even have some scripts set up for loading which is perfect. But I am also interested in keeping my db 'current' and not having to wait on other folks to post their changes, but that is in the future and this is great for now. KS, if you have anything already available I will gladly have a look. Otherwise I may bug you down the road as I get a little more comfortable with all of this.
                Yeah, the Zimmerman brothers used my scripts, so if you look at my site, which I linked in the post above, it should be compatible with what they did. I've made some script changes to parse the new fields that MLBAM has added over the years, and there are a few corresponding database changes, but those are detailed in the updates on my site (labeled May 2011).
                "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

                Comment


                • #9
                  KS, in terms of setting up from the start, would the steps you recommend be:

                  1) Use your initial scraper, dbparse, and pitch update scripts to do the initial population of your db structure. Then run the Zimmerman scraper scripts to maintain?
                  2) Use your initial scraper, dbparse, and pitch update scripts to do the initial population of your db structure and maintain using the scraper scripts you wrote.
                  3) Or do the Zimmerman scripts have all the changes you made in 2011 and I should grab the 400+ meg mysql dump and maintain from their scripts. I notice they are grabbing some batter and pitcher info which it seems like your scraper script does not have but I don't see them referenced in the xml import script.

                  Doing a diff between your files, I don't see much difference except the pitcher and batter data pull by the Zimmermans. So my instinct is to take their route as they seem to have a version that is up to date with yours and can scrape everyday with decent speed?

                  Comment


                  • #10
                    I made some substantial speed improvements from my original versions of the scripts. I'm not sure how the Zimmerman scripts differ from mine. If I understand which batter and pitcher info you are talking about, i.e., the pbp/batter and pbp/pitcher files, MLBAM has discontinued providing that information (which was duplicate information anyhow).
                    "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

                    Comment

                    Working...
                    X