Announcement

Collapse
No announcement yet.

Announcement

Collapse
No announcement yet.

Baseball Prospectus Annual 2015 - a quick review

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Kevin Seitzer View Post
    It's not run on Excel spreadsheets any more. When Wyers was there, he built a data warehouse for them and migrated PECOTA to SQL. But it was not a small project. Having built a projection system myself now, with assistance of one other programmer, I can testify that it not something you turn around in a short time. It took a fair number of man-months.
    are you using something more than is in the normal databases (like what is in sean lahman's)? like every pitch from pitchf/x or something? if not, then you don't need a data warehouse and SQL - pulling an individual player's stats out of a simple csv text file takes a few seconds - the files are only a few MB. doing the regression is straightforward and should be fast. i'm still not seeing the difficulty. yes, a month or two, to dot the i's and cross the t's, but i would think BP would be willing to invest that in order to improve the quality of their product...
    "Instead of all of this energy and effort directed at the war to end drugs, how about a little attention to drugs which will end war?" Albert Hofmann

    Comment


    • #32
      Originally posted by bryanbutler View Post
      are you using something more than is in the normal databases (like what is in sean lahman's)? like every pitch from pitchf/x or something? if not, then you don't need a data warehouse and SQL - pulling an individual player's stats out of a simple csv text file takes a few seconds - the files are only a few MB. doing the regression is straightforward and should be fast. i'm still not seeing the difficulty. yes, a month or two, to dot the i's and cross the t's, but i would think BP would be willing to invest that in order to improve the quality of their product...
      I'm not going to address what exactly I do, but what BP does for PECOTA (and what Szymborski does for ZiPS) is more than what is in Lahman's database, yes. For starters, Lahman is only MLB. But it is much more than that. The system you are describing is something akin to what Bill James was doing back in the early 90s and is way simpler than what projection systems do these days.
      "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

      Comment


      • #33
        I'm not suggesting that developing a modern projection system is a gargantuan intellectual exercise. A number of people have done it. It is what one of my old professors called "straightforward but non-trivial." The devil is in the details. Though many people have done it, they have pretty much all done it over the course of a year or more, and there is good reason for that. Then once the first revision is complete you are constantly going back to address the simplifying assumptions and leaps of logic that you made to get there. A projection system is always a living document (or code base, if you will).

        No, it doesn't take months to run the code once it's written. That takes hours. But testing and revising the code--eliminating bugs, adding features, improving underlying assumptions--is what takes time.
        "Jesus said to them, 'Truly I tell you, the tax collectors and the prostitutes are going into the kingdom of God ahead of you.'"

        Comment


        • #34
          Originally posted by Kevin Seitzer View Post
          It's not because Silver was some genius and the people who followed him are ignoramuses.
          I don't think those who followed him are ignoramuses, but I don't think they measure up to Nate. Which is no disgrace since I believe Nate is a genius.
          I'm just here for the baseball.

          Comment


          • #35
            Originally posted by Kevin Seitzer View Post

            No, it doesn't take months to run the code once it's written. That takes hours. But testing and revising the code--eliminating bugs, adding features, improving underlying assumptions--is what takes time.
            i must have misunderstood. i thought you implied that PECOTA took months to run, because they started it right at the end of last season, and it didn't complete until roughly now.

            i fully agree that the devil is in the details and you'll always be tweaking. again, i was just flabbergasted by what seemed an incredible inefficiency in actually running PECOTA. it seems what you meant is that they are tweaking the algorithm during that period. apologies for misunderstanding what you were saying.

            ETA: but i still don't think it's that complex .
            "Instead of all of this energy and effort directed at the war to end drugs, how about a little attention to drugs which will end war?" Albert Hofmann

            Comment


            • #36
              Mike, really appreciate your comments here. We are fortunate that you're sharing your perspective. Very interesting. Thanks!!!

              Comment


              • #37
                Originally posted by bryanbutler View Post
                ETA: but i still don't think it's that complex .
                this was too flippant, and the sarcastic smiley doesn't quite convey what i meant. i don't want to make light of the effort that mike and others that do this for real put into it. as he said, the devil is in the details, and there are a lot of details. what i meant that in general this type of problem isn't terribly complex. you have past data (majors and minors - even high school i guess if you have it and it's worthwhile); you have modifiers for that data for each year (park adjustments, injury adjustments if you want, even intangibles if you want to build that in somehow, probably most importantly a weight); you regress to get an estimate of production; you then predict how many PA a player will get and map the production onto those PA. none of that is too tricky in principle.

                in practice, it's the modifiers that make the difference (those devilish details), along with the estimate of the PA (for those with any uncertainty about it). those are the details that mike (and others) tweak to get right, that those of us sitting on the sidelines probably aren't capable of. if we were, we'd be doing it, after all.

                so, again, apologies mike - i didn't mean to make light of the effort you and others put into getting projections as good as you can.
                "Instead of all of this energy and effort directed at the war to end drugs, how about a little attention to drugs which will end war?" Albert Hofmann

                Comment


                • #38
                  To anyone who bought this book, please take a few moments of your day to go and read the Josh Harrison capsule.

                  It's probably the most amusing thing in the entire book!

                  Comment


                  • #39
                    I agree, SilentMist. Read it already. A great example of how great the capsules in the book were this year.

                    Comment

                    Working...
                    X