astro-ph Rationalized

Here is probably the single most helpful thing I have ever done for the world. Last month Paul Ginsparg, who did a world-changing thing by inventing the arxiv system for sharing scientific preprints, was visiting Pasadena, and dropped by Caltech. We chatted a bit about blogs, the internet, the preprint server, ways one might incorporate links to blogs and talks and newspaper articles and all that (some of which already exists in the form of trackbacks). And he told me a fun math problem I will blog about at some point.

And then he asked, “Is there any other obvious way the arxiv could be improved?” To which I naturally responded, “You mean in addition to subdividing astro-ph into categories?”

The problem with science is that there’s just too damn much of it. Every weekday, when one peeks at the new listings on astro-ph, one is faced with 40 to 50 new abstracts to read. That’s a lot of science to wade through, and it’s especially bad for people who work on the boundaries and might also be interested in hep-th, gr-qc, hep-ph, and/or other categories. (I haven’t yet broken down and started reading quant-ph.) Especially since, just because you are interested in issues at the interfaces of conventionally-defined disciplinary boundaries, it doesn’t follow that you are interested in every single kind of research that is carried out in every one of those disciplines. An early-universe cosmologist, for example, might not be interested in star formation or the interstellar medium. Or they might be; but perhaps not.

Nevertheless, everything astronomy-related on the arxiv gets put into astro-ph, from models of inflation to light curves of W UMa contact binaries. And if one was interested only in some subset, one needed to sift through the 50 abstracts to search for the few that struck a chord.

Until now! Paul and Mark Wise and I chatted for ten minutes and came up with a perfectly sensible (I like to think) set of categories into which astro-oriented papers would mostly fall, and Paul went away promising to implement such a scheme. After chatting around with a few actual astrophysicists and fine-tuning the system, it’s now done! That wasn’t so hard, was it? (Part of the reason this hadn’t happened much earlier is that certain astrophysicists who will remain nameless took a “eat your vegetables” approach to the problem, insisting that it was good for anyone to look at every single astro-ph abstract if they were possibly interested in any of them.)

Here is what I was happy to find in my email just now:

By popular request, the Astrophysics (astro-ph) archive has been split into six subcategories:

CO Cosmology and Extra-Galactic Astrophysics
EP Earth and Planetary Astrophysics
GA Galactic Astrophysics
HE High Energy Astrophysical Phenomena
IM Instrumentation and Methods for Astrophysics
SR Solar and Stellar Astrophysics

For more information, see the subcategory descriptions at http://arxiv.org/archive/astro-ph (including links to the subdivided new and recent listings). This split should make announcements of new papers more manageable for those interested only in subsets of astro-ph. New astro-ph submissions must assigned one or more sub-categories. (Existing astro-ph articles will be machine-classified according to the new scheme when enough training data has been collected.)

To subscribe to the daily e-mail notifications for only a set of subcategories, you should first cancel your existing subscription, and then subscribe only to the subcategories of interest via physics. See http://arxiv.org/help/subscribe For example, you could send two emails

——–
To: astro-ph@arxiv.org
Subject: can

——–
To: physics@arxiv.org
Subject: subscribe [Your Name]

add CO
add GA

O frabjous day! Callooh! Callay! Undoubtedly some curmudgeons will gripe that their particular kind of research doesn’t fit snugly into any one of the categories. Fair enough; let the powers that be know, and they’ll do whatever is reasonable to make sure the system evolves appropriately. But for right now, my early evenings (abstracts appear at 5 p.m. Pacific time) just got a little brighter.

56 Comments

56 thoughts on “astro-ph Rationalized”

  1. It’s clearly part of a devious plot: applicants to graduate school will have to specify which sub-arxivs they like to read, and if they pick the wrong one their applications will be thrown in the trash without a second glance!

  2. I’m not really against this, and I don’t think we should force everyone to see all the astrophysics papers (as long as the option to see all of them remains). Though, like others I am concerned about how this was decided.

    Isn’t Cosmology and Extragalactic Astrophysics a rather big category? Even given how many extragalactic papers are classified under Galaxy Astrophysics, it’s still a vast range (under the recent submissions, you’ll see that it has 15 papers; the rest have 1,3, and 7-9). In particular, I think there’s a rather big difference between the very-high-energy very-very-early very-speculative cosmology papers and the more mundane z = 0 clusters papers. To use your example, I would guess that there is as big a gulf between the people who read the quantum-gravity cosmology papers and papers on galaxy clusters as there is between the people who read galaxy cluster papers and W UMa light curve papers. In fact, a lot of people who work in galaxy clusters probably have been or are astronomy students rather than physics students, in which case they’ll have some idea of what the variable star people are talking about but no grasp of the quantum gravity papers.

    So, I think there would be some merit to splitting cosmology into an Observational Cosmology and Extragalactic section and something like a High Energy Cosmology section — papers that are cross-listed under gr-qc and the hep categories would tend to fall in the latter. There would be considerable overlap, of course — dark energy, astrophysical detection limits on dark matter, for example — but there’s also overlap between the other categories in astro-ph.

    Also, I wouldn’t want High Energy Cosmology to be sloughed into the current High Energy Astrophysics section. Again, while there’s considerable overlap (especially with dark matter detection), most of the High Energy Astrophysics section deals with the high energy tail of non-cosmological things that don’t require quantum field theory or general relativity to understand. Most High Energy Astrophysics papers have strong connections with several branches/sections of astrophysics (supernovae, the ISM, pulsars, AGNs, even planets). The High Energy Cosmology, on the other hand, tends not to make much contact with most of astronomy, and I think is largely a different community.

    A few other gripes:
    *Gravitational wave production and detection is listed under “Solar and Stellar”. What about gravity waves from merging supermassive black holes (detected with LISA) or inflation?
    *A lot of nebulae might also fall under “Solar and Stellar” besides “Galaxy”, but there’s some overlap.
    *I’ll echo yet another astronomer’s suggestion of a Physics of Astrophysics subcategory. Hydrodynamic instabilities, turbulence, and magnetic dynamos would fall into it naturally, for a start.

    Seriously, people are being offered an option that they can choose to take advantage of, but are not being forced to, and they’re complaining about it?

    “By the way, honey, I took out another mortgage for a summer home. Why are you so upset? It’s not like you have to live there, you know.”

  3. Sean @ 1:33pm:

    People aren’t complaining that there’s a new option; people are complaining that after 10+ years this finally happened and it’s useless to so many of us. A nice, general purpose, well-thought out system could have been used; instead, a few guys bullshitting over lunch decided that the entire discipline could be most usefully divided up into 6 non-overlapping chunks, and so that’s how it was done.

    A *lot* of thought has gone into categorization of astrophysics, and into tagging of large online databases. Ideas could have been taken from either or both of those, but were not.

    For someone interested in methods (say, different computational or observational techniques) or underlying bits of physics (hydro instabilities and dynamo theory, as pointed up above, or CR acceleration, or turbulence, or…) with wide application to lots of different sorts of objects, the above categorization is *exactly* orthogonal to what would be useful.

  4. The underlying problem being that this wasn’t done as a useful, interesting way of categorizing astrophysics papers; it was clearly done as a response to “man, am I ever tired of seeing the abstracts for those boring stellar astronomy [or whatever] papers”

  5. Seriously, people are being offered an option that they can choose to take advantage of, but are not being forced to, and they’re complaining about it?

    Probably because it means that a different, better option will not be soon in coming.

  6. Look, for purposes of an entertaining post I focused on the conversation I had with Paul and Mark. Which I do believe was important in actually sparking something to get done. But, as others have noted, something like this has been discussed for a very long time, and a lot of input was solicited, including after our conversation. (And, more importantly, it was checked that the proposed scheme would fit as many as papers already on astro-ph as possible.) It had been held up because of people who thought that any sub-division was bad, and the fact that the tiny arxiv organization had other priorities to get to first.

    There are a large number of ways to categorize research in astrophysics: by type of objects being observed, by methods being used, by wavelengths observed or instrumentation employed, etc. No system will ever keep everyone happy. One could have formed committees, and solicited white papers, and held town hall meetings, and something might have been done within the next twenty years. Or one could just do it, and then tweak the system in response to community input. Which will be listened to, especially once the system has had a chance to be tried out, and especially if it’s of the constructive type (“these should be the categories” rather than “your categories suck”).

  7. “There are a large number of ways to categorize research in astrophysics: by type of objects being observed, by methods being used, by wavelengths observed or instrumentation employed, etc.”

    Yes, rather. Wouldn’t it be lovely if someone had noticed that before the implementation of this one-dimensional system?

    Many such categorization systems which are well tested *already exist*. ApJ has a multi-`dimension’ hierarchy, for instance. A stripped down system with categories for both scale of object (as now), type of object, type of physics, and type of method would have worked. Simple tagging with arbitrary tags would have sufficed. The web has a lot of that already.

  8. Look, for purposes of an entertaining post I focused on the conversation I had with Paul and Mark.

    Except that the post isn’t entertaining. It’s just a smug “YOU’RE WELCOME” to a community that didn’t ask for your help. You can’t be surprised when it comes off as off-putting.

  9. These complaints are, largely, silly. The arxiv always already relied on relatively broad, overlapping, loosely-defined categories (astro-ph, gr-qc, hep-ph, hep-th, ….). They’re a crude sorting mechanism, not a detailed hierarchy that classifies every paper. Anyone is free to go design their own classification scheme where papers can get multiple tags and one can search for papers in the intersection of “gamma rays” and “dark matter”, or whatever else you like. Spires, for instance, gives many different ways of indexing and searching papers related to high-energy physics, and it’s frequently far more useful than the arxiv itself for finding a paper. But that isn’t what arxiv does. Arxiv gives a place to store the papers, and a convenient daily listing. Exhaustive categorizing and tagging is not a useful way to get a daily listing. The most useful scheme for the arxiv is one where the categories are relatively big, so you don’t miss anything, but not so big that you have difficulty getting through the daily list. The new astro-ph scheme looks like a nice compromise.

  10. Sean, there are a couple of reasons why people might bitch, even if it is unfair to you. Yes, I know that subcategorization has been debated for a long time. In fact, the 2005 reorg plan, see for ex. http://golem.ph.utexas.edu/~distler/blog/archives/000643.html, was significantly worse – it didn’t even have an instrumentation category. I will refrain from speculating on why.

    Reason 1: Astronomers do have to know some things outside their little subfield. You can’t understand galaxies without knowing basics about stellar evolution, and a result on globular clusters or Galactic star formation can influence extragalactic or cosmology papers. This is less true of, say, condensed matter vs. particle physics. Many astronomers like this, and want to encourage it in students, aka “Eat your vegetables.” (Just like they want both faculty and students to attend every week’s colloquia and not skip the extragalactic/galactic/planetary ones.) Now, I recognize that we can still read all of astro-ph if we want. But this is a symptom of the field getting bigger and less manageable and people devoting their time to the endless grant and teaching cycle rather than listening to their colleagues. So lots of people will resent it even if it’s inevitable.

    Reason 2: It activates suspicions that the subdivision is driven by physics and cosmology types who don’t want to see the titles of, much less read, papers about globular clusters or A stars. Of course, there are, in fact, people out there who work on A stars and don’t want to read yet another speculative paper about the microwave background. But they tend not to be vocal about it. Again, this is partly a symptom of a shift in the profession: cosmological tests, dark energy, and the like are attracting a lot of interest, and money, from the physics side and many astronomers are a little worried that this will re-order priorities or make them small parts of large machines. This is another thing that we resent because it’s inevitable. It’s like resenting globalization, or when your neighborhood bank got bought by Citibank. You have to deal with it, but you don’t have to like it.

    So we’re just shooting you, and the arXiv, because you’re the messenger. I hope that makes the holes easier to patch.

  11. Hmmm… I wonder how large the training set needs to be to categorize past entries. I’m guessing – I’m hoping, that the threshold is set low enough that past entries are tagged into multiple groups if they contain enough keywords from various categories. Does automatic categorization dump the entries into separate folders or are the entries just tagged? If the system can learn from individual subscribers interests, then it should be that much better. This shouldn’t be that big a deal in the end. Good improvement.

  12. Pope Maledict XVI

    The funny thing here is all the glee at not having to look at the titles of papers on the arXiv. I suspect that, for many people, reading the arXiv is a painful duty, and *anything* that reduces the time spent on it is welcome. As somebody said on another blog, the best thing about Christmas is that the arXiv gets turned off for a day. It tells us something about our true feelings about the kind of research we are all doing and [reluctantly] reading about these days.

  13. I, for one, appreciate this attempt. For anyone who doesn’t like it, the old option is still available.

    But is there any hope of something like this being done for hep-th, hep-ph, and gr-qc? Why only astro-ph?

    Is there a place where suggestions for categories are being accepted?

  14. In my dream world (and this is really dreaming) I could go to arXiv, select from a small collection of keywords, and have a custom RSS feed appear.

    Again, that’s in my dream world; although I can’t really think of a reason why they can’t do it!

  15. Omar, didn’t know about the myADS, I should check that out! The cosmocoffee app (written by Antony Lewis) allows you to filter all the arxiv sections (not just astro-ph) you want to read with your personal selection of keywords, bookmark papers you like with the option to add notes, and also to add them to your local journal club (there is an extra journal club functionality you have to set up first). It’s a neat tool.

  16. @Hiranya, @Questioner:

    Almost all of Questioner’s requests are satisfied by cosmo coffee, EXCEPT that while you can get customised arXiv results on a web page, you cannot get an RSS feed of these results (AFAIK)

    Overall, I prefer cosmo-coffee to myADS because I can add author’s names into mix as well as abstract keywords.

  17. Hi Mike, I am afraid I am one of those people who never warmed to RSS feeds. I like sitting down with the day’s “catch” from the cosmocoffee filter with a cup of coffee, reading through and bookmarking the papers I want to read more carefully. I guess the RSS feed is useful if you don’t read the arxiv every day (e.g. while you are travelling)? I can ask Antony whether he can add it. I also get the full arxiv abstract email and if I have time I quickly page through it to see if the filter missed something interesting, but it only does so very rarely once a good set of keywords is in place.

Comments are closed.

Scroll to Top