A second look at SEER

Last week, a friend asked if I had come across a source evaluation tool which interacted with Turnitin’s text-matching software. Attached to the email was a copy of Turnitin’s Source Educational Evaluation Rubric (SEER).

That was news to me! Interactive with Turnitin? Trying to work out why my friend thought SEER was interactive, and with Turnitin, took me down some strange paths. And the search got me taking a second look at SEER, a second look and a closer look. A strange journey.

I had in fact been alerted to the release of the rubric back in January 2013, in a press release from PRNewswire, and an invitation from Turnitin to join a webinar on “What’s wrong with Wikipedia?” This was a webcast designed to highlight a new Turnitin White Paper, What’s wrong with Wikipedia?: Evaluating the Sources Used by Students. It was the launch of SEER.

The webcast itself was something of a disaster – Turnitin’s internet connection went down! But Turnitin later released a recording. And although Jason Chu, one of the two presenters, mentions in passing that the White Paper includes a copy of the interactive SEER rubric, there was no demonstration, nothing to indicate that this was something out of the ordinary.

The SEER rubric offers a way of evaluating websites and other sources in terms of five criteria: Authoritative (sic), Educational Value, Intent, Originality and Quality. As such, it is similar to other source evaluation checklists and rubrics such as Kathy Schrock’s Critical Evaluation Surveys for elementary, middle and high schools; or (adaptations of) Howard Rheingold’s CRAP test or McHenry County College Library’s Source Evaluation Rubric, and there are many many more.

{Even as I was editing this for publication, Dianne McKenzie published her own C.R.A.P. Test Rubric on Library Grits, well worth looking at.]

And, yes, the SEER rubric is interactive. But it isn’t interactive with Turnitin software. What it is interactive with is Adobe Reader software. You input your value and the (Adobe) software works out the SEER score. It is almost like magic. Well, no, maybe not almost like magic. It’s an interactive form, that’s what it is.

It’s subjective. It’s you, the user, the student or the teacher, who looks at a source and decides which of each criterion’s descriptors best match the source, and then you tick the box. The software adds the score and gives a rating; the higher the rating, the more useful is the source, in terms of authority, educational value, intention, originality and quality. Subjective. And interactive too.

I did not appreciate this interactivity first time I looked. But then, I use Skim and Preview on a Mac, and the rubric is not interactive in these readers. When I had a PC, I used Foxit and Sumatra to read pdf files. Probably there would have been no interactivity there either. I wonder how many users of other non-Adobe pdf reader software missed out as well?

But, clearly many SEER users do use Adobe Reader. What is more, so many of them were amazed by the interactivity of the form that Turnitin produced a video SEER: making of the interactive rubric PDF to explain how they did it. I’m waiting (though not with bated breath, I have to admit) for the sequel, Son of SEER.

So, that off my chest, my friend’s query answered and with her encouragement, I continued my close look at the SEER rubric. For during the search, I had come across a page on the Turnitin site, Ratings for Top Student Sources, and this looked interesting.

It seems that, following the “What’s wrong with Wikipedia study”, Turnitin took the 100 sites most often identified by Turnitin software as the source of secondary school (SS) students’ matched content during the year 2011-2012, and asked secondary school teachers to rate the sites, using the SEER rubric. They did the same with the 100 sites most often identified by Turnitin as the source of higher education (HE) students’ matched content, having higher education teachers and instructors rate these sites, again using the SEER rubric.

The SEER scores were averaged out for each level, and the result is Ratings for Top Student Sources.

Interactivity

This page shows how teachers and instructors rate the most commonly used sites in terms of the SEER criteria, it is a measure of “worthiness”. What’s more, this page really IS interactive. You can sort the sites by the individual criteria or their overall scores, low-to-high or high-to-low, or by the most-to-least (or least-to-most) commonly used (of the top 100 for each of the two levels of education). What is more, you can look at just the academic sites, or just the news & portals sites, or the social sites or the paper mills, or the encyclopedias or the shopping sites, sorting within each.

Students, it seems, choose unreliable sources. Turnitin aims not only to confirm what many teachers already suspect – if they do not already know. It aims also to provide a tool which will allow students to judge the merits of sites they look at for themselves.

All of which provides food for thought.

The sites, for instance. The criteria. The sites and the criteria. And Turnitin itself, the way it identifies content matches.

The sites.

We do need to remember that the sites identified as being most used are not necessarily those which the students writing the papers used. They are the sites on which Turnitin found text matches. Nor is this about plagiarism; there is no indication as to whether the text matches found were cited and referenced, in quotation marks or bare. This is not about plagiarism, it is about the quality of the sources which students use.

There could be some inaccuracy in the statistics gathered. When Turnitin claims that 3,801,022 matches found on higher education papers came from Wikipedia, we do not know if this is a high or a low estimate. Wikipedia writers may well have taken material from other sites, cited or otherwise. (The other way too, of course; writers have been known to take their material from Wikipedia, cited or otherwise.)

Given Turnitin’s propensity to highlight Wikipedia URLs as matches (see, for instance, Carried Away), it may well be on the high side. (Book titles too, and this may explain other anomalies mentioned below.)

These considerations aside, there is still the question, how did some of the allegedly popular sites get there?

Google.com, for instance, gets an overall rating of 1.8 (out of 4.0) on the SEER scale. Google is classified as a News and Portal site, and Turnitin found 98,330 matches for Google content in higher education papers. Did I get that right? Turnitin found content used in HE student papers that matched content on the Google site 98,330 times. This begs the question: how much content is there on the Google site – content which students can find – content which is useful enough for them to include in their academic papers? This is not material found at the password-entry protected sites.google.com (460,948 HE matches), that is listed separately. It isn’t material from books.google.com (407,016 SS matches). This is Google proper.

What about Worldcat.org, rated 3.07 at the secondary schools level and with 43,547 matches, also uncategorised. How much content is there on WorldCat? Not a lot, surely? Surely not enough for it to score as one of the top 100 sites favoured by secondary school students?

[WorldCat is a catalogue of library catalogues; typically you use it to find which libraries own given titles, or to learn if a library near you owns a given title. Just as Turnitin identifies URLs as text matches, is it finding matches for book titles on WorldCat?]

Where is Turnitin pulling up these sites, what matches is it really finding?

And the ratings? How were they obtained? How did WorldCat get a 3.07 rating when Questia scored 2.87? Yale University got a perfect 4.0, but Sage Publications only 3.33. Springer Link got 3.40, a mite less than the New York Times, 3.46. The highly-regarded journal-aggregating site Highbeam scores just 2.31 at the HE level? A rating that low means that the site is considered barely credible.

[For those who do not know, Questia is a subscription online library; it aggregates content published elsewhere. Highbeam is an archive of newspapers, magazines and journals, and is part of the Gale-Cengage group; access is by subscription. Springer Link and Sage are highly respected publishing houses which specialise in academic and scholarly journals; once again, access is by subscription, though individual papers may be bought on demand. It is encouraging to see such sites used as sources in student work, but given their scholarly nature – especially the two publishing groups – why aren’t their ratings higher?]

It is good to have a method for evaluating sites and sources – but only if it really is useful, if it really works. Does SEER work? Just what do the ratings mean?

Closely related to these questions, how were the ratings obtained?

There I think we have the answer, part of it. On another Turnitin webcast, Grading Student Sources: Rating the Top 100 Sources Students Use, we are told that Turnitin asked secondary school teachers to evaluate the sources most used by secondary school students and higher education teachers and instructors to evaluate their most common sources. We are not told how many teachers were involved, nor what training or instructions they were given, whether they even had to look at the sites or just go by reputation, or perhaps just the site name, before ticking the boxes.

Did they look at sites they did not know? I wonder. If they had looked at sites unknown to them, they might have raised questions about some of the sites, might have rated some sites more highly – or less highly.

And other oddities. The site, e-reading.co.ua, classed as Academic, links to a site partly in English and partly in Russian: here you can get free downloads of English-language books, still in copyright. (Noted: the link redirects to http://www.e-reading.co.uk/.) Or icedrugaddiction.com, also classed as academic? Or lib.ru, this time wholly in Russian?

Come to that, how do they – how do you – rate a site against the criteria? It might be possible to rate a page – but a whole site?

So let’s look at the criteria, two of them, anyway.

“Educational value” got me thinking: “Does the site content help advance educational goals?”

How can you tell? That is such a wide catch-all. Doesn’t it depend on the curriculum, on the unit? On what is being taught? On what students are using the site for – though they probably are not using the site, just specific pages found on the site? APA or PubMedCentral – both scoring 4.0 – may be useful sites if you happen to be studying Psychology or Medicine respectively, but if you are a history professor or a Spanish language teacher your educational goals may not be so readily met.

APA? The American Psychological Association? High educational value? This is another site with limited content openly available on their website, mainly newsletters and abstracts and blurbs. Yet 504,088 content matches were, apparently, found in secondary school student papers. Curiouser and curiouser. (Pedantic disclosure: Alice said that before I did.)

Educational value? So much depends on purpose. And if one is given free choice of subject, freedom to decide one’s hypothesis or research question, who can say?

Then there is “Originality: Is the site a source of original content and viewpoints?”

Hmmm. Do we really want originality? Possibly, possibly if supported by evidence, research, argument, cited sources. Is that what is meant in the SEER rubric?

The site tomdaschle.com scores 4.0 for Originality (though much lower against the other criteria). I do not know a lot about Tom Daschle, I must admit. I gather that he was a respected US politician until he became less respected.

What seems super-original about this website is that it has been defunct for some years,

This is a particularly puzzling source of matched content. The original site was last sited – sighted – in 2 December 2004. Since then, according to Internet Archive captures, the site has disappeared, reappeared with redirection, reappeared as a general sales site, disappeared again, and once more reappeared – at latest sighting, the domain is up for sale!

Evolution of the tomdaschle.com site.


An early sighting : 4 April 2004	2 December 2004 - gifs not captured butstill tomdaschle.com?	23 March 2005 - nobody at home (on the tomdaschle.com site)?

25 December 2005 - redirecting to NewLeadershipForAmerica.com	3 November 2007 - redirecting to NewLeadershipForAmerica.com	7 February 2011- back at tomdaschle.com

6 September 2011 - during the SEER analysis	25 June 2013 - gone missing again!	and today, 16 December 2013, tomdaschle.com - "listed in the marketplace."

While the site in its various incarnations is a mystery, the present question is, how did Turnitin find 1,812 HE content matches on the tomdaschle.site when there was so little there to be found, so little for HE students to use?

Perhaps there is magic here after all.

What about other sites given good ratings for originality? Not many sites score highly here; originality may be difficult to achieve. Amongst the HE sites with the highest ratings for originality we find two encyclopedias (Encyclopedia Britannica and encyclopedia.com), a dictionary (dictionary.com), and three archives (Bartleby, Project Gutenberg, and Internet Archive). Secondary school sites get higher ratings for originality than in the HE sector. Six of the 13 sites scoring 3.7 or above for originality at the SS level are health and medical sites. What exactly is originality, in the SEER context?

And the other criteria, what about them?

“Authority: Is the site well regarded, cited, and written by experts in the field?”

“Intent: Is the site a well-respected source of content intended to inform users?”

“Quality: Is the site highly vetted with good coverage of the topical area?”

They are useful but … you have to know to know. And that takes investigating. I doubt students at secondary level would have the patience or the motivation to go exploring how well regarded a page or a site is, how often cited by others. Would undergraduates? This might be desirable, but is it practical? Quality? How can you tell without already having good grounding in the subject? Intent? Similarly. I am none too sure how thorough was the investigation performed by those who rated the sites for Turnitin. As a student tool, there are even more problems, all the above, plus.

And what about other criteria which might be used? Relevance to the purpose? Currency or contemporaneity? Bias? Audience? Language? The best-regarded site is useless if the reader does not understand the content or if it does not help with the purpose of visiting.

And if we are dealing with sites, then perhaps factors such as readability and findability and navigation and more might be useful considerations?

Perhaps with thoughts of these other criteria, perhaps used to analyse pages rather than sites… then SEER might be a useful tool. But the Sites Rating page? Haplessly misleading, methinks.

In recent years, Turnitin has attempted more and more to escape its “Gotcha!” image and be seen to promote good writing and good writing practice. Promoting the use of reliable and authoritative sources is a step in the right direction. But SEER doesn’t quite hack it, and nor do these ratings. Some way to go, Turnitin. And now I’ve had my second look, perhaps you should too?

Addendum:

This investigation is not intended to be exhaustive; there may be other anomalies. I am sure there are.

Just for the record, the following table gives the number of papers submitted to Turnitin between July 2011 and June 2012 (suggesting at least two more typos on the Ratings for Top Student Scores page) and the number of matches found, at each level of education.

	Papers submitted	Content matches found
Secondary School students	8,000,000	44,000,000
Higher Education students	28,000,000	112,000,000

Sources: Turnitin White Paper: The Sources in Student Writing – Secondary Education Sources of Matched Content and Plagiarism in Student Writing. iParadigms, 2012 (3).
Turnitin White Paper: The Sources in Student Writing – Higher Education Sources of Matched Content and Plagiarism in Student Writing. iParadigms, 2012 (3).

Honesty, honestly…

Workshops, Advice & Consultancy – for teachers, librarians, students…

One thought on “A second look at SEER”

Leave a Reply Cancel reply