Instructions around the usage of meta robot tags and robots.txt files
github.comWhy are they banning the methods instead of saying how the pricing should be displayed. E.g. why say you cannot have "noindex" instead of saying "the pricing information needs to be in plain-text, human-readable, accessible, indexable..." and so on.
They are doing that as well. They are posting general requirements, as well as providing details on specific situations. From the main project @ https://github.com/CMSgov/price-transparency-guide
Section: Overview
> All machine-readable files must [...] made available to the public without restrictions that would impede the re-use of that information.
Section: Public Discoverability
> These machine-readable files post made available to the public without restrictions that would impede the re-use of that information.
They do both. As well as provide a description of the regulation's goals, context and examples. This is how regulation works. It typically starts with a rather vague law, then the regulatory agencies make up general rules to implement said law. Then they create a bunch of more detailed rules. Then as times change, they amend those rules. You can even ask them about novel situations and get a (non-binding) "opinion" from government agency. In my experience, the federal gov regulatory apparatus is not as inept as most people seem to think.
That’s a good way to write a law but regulations are guidance under which the regulator won’t come after you when it considers law enforcement action i.e. the law enforcement agency’s interpretation of the law and your obligations under it. They can afford to get into the nitty gritty, and it can even be beneficial for them to do so for all involved.
This way lawyers don’t have to divine from the law that noindex and nofollow tags might invite enforcement action, if they have even heard of them before, they can read the regs, and advise their companies and clients properly.
Why not both?
Because wouldn't they have to keep adding new techniques to the list continuously when in the other case they can say in what format the content needs to be available in.
Indexable is open to interpretation. No disallow directives in robots.txt isn't.
Better yet in a database with consistent schema
Nice. They should probably add HTTP headers (X-Robots-Tag) to the list. Cloaking, too.
And ban the use of PDF's - which is another way this could be avoided.
Oh and mandate clean links no funky javascipt links that search crawlers don't follow.
Per the main project link, the pricing information required to be posted in JSON format with specific schema and file names.
JSON is allowed, but not required. They just required open, non-proprietary, formats. They specifically gave YAML and XML as other examples.
Search engines can index pdfs.
Yeh but they don't work as well and wont be as discoverable
Provided the content is not encrypted.
I stand corrected: encrypted PDFs are indexed unless a password is set. However, in the past they were unable to do so[0]: "Generally we can index textual content (written in any language) from PDF files that use various kinds of character encodings, provided they're not password protected or encrypted."
[0] https://developers.google.com/search/blog/2011/09/pdfs-in-go...
I think the assumption is that encrypted pdfs would not be used here because users still have to view these documents
No, the PDF standard explicitly allows encryption without the need to set up user password; in this case, the owner password is set. The only change for the user is that the word "Secured" usually appears next to the filename in the reader.
If you check the main read me proprietary formats are specifically not allowed, with notable examples of PDF, and excel
As I read this, it seems like this is saying that if you are saying that you are posting public pricing information, you should not be hiding that information from search engines.
Seems like it's trying to avoid people (I guess it's targeted toward insurers, if I'm reading the main page correctly) saying that they are transparent about prices but hiding the information so no one sees it.
Please correct me if I'm wrong on this.
They should retroactively penalize hospitals for trying to hide this information as it is clearly against the spirit of the law.
The US Constitution prohibits “ex post facto” laws. You can punish someone for something that wasn’t a crime when they did it.
Sure. I'm not saying make a new law then punish. I'm saying they should be punished based on the existing law that said prices had to be published. Similar to punishing someone for tax evasion even though the specific technique they used wasn't explicitly enumerated in the law.
This will be a fun example of legalism v. Hacker ingenuity. Everytime they figure out a specific legal way to write up a trick to hide prices, the folks will find another loophole to hide them.
Honestly. This could be a fun race to the bottom.
Typo in headline?
We've changed the title from "Regulartors ban hospitals from using “noindex, nofollow” on pricing pages" to what the page says.
Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
My brain initially parsed "meta robot" as "megarobot" and I was really excited until I got to the last part of the sentence.
Nope, they're artistic regulators
At least they are keeping up with current technical tricks.