New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/gtin should expect URLs (URIs), as promoted by GS1 #3156
Comments
@philarcher suggests that
Is what most will need for GTIN extraction. Given the complexity I am wary of putting it in the spec as-is, but for now will point to it here, where discussion or tweaks could be more easily shared. This regex should parse all GS1 Digital Link that encode GTINs (there are others...). It allows a GTIN to be 8, 12, 13 or 14 digits long (but not any other length). |
@philarcher @alex-jansen can you take a look at this. Raw schema file is https://raw.githubusercontent.com/schemaorg/schemaorg/main/data/ext/pending/issue-1244.ttl I will proceed towards staging but welcome your early review! (and anyone else's...) |
I should add that the definition already encourages GTINs as URLs in the textual part, which is why things are currently confusing and inconsistent. |
Let me add a bit of background. The horrendously complicated RegEx covers a lot of the flexibility that we include in the GS1 identification system. We can simplify it a lot if:
With those important restrictions we can start to make things simpler. Note that the idea is to convey the GTIN as a data element. By encoding it in a URI, you provide one possible place where that GTIN can be looked up - the default online place to look it up - but it doesn't need to be the only one. In other words, the domain name is not part of the identifier. For that to work, the structure of the URI must be adhered to. You can't just use any old URL for schema:gtin that happens to include the digits and you absolutely definitely categorically cannot use just any old URL that doesn't include a GTIN at all. And btw, GTINs have structure too. They're based on a issued prefix, there is a check-digit and so on. They're not just numbers. See your local GS1 for details. OK, so here's the structure you need:
Again, this is all because I'm offering a restricted set of options to keep this simple. Chapter and verse is in the GS1 Digital Link URI Syntax standard. So these are OK: These are not OK: OK, so with all that done, here's a slightly simpler regex: ^https?://([^\/?#:])?([^?#])?/01/(\d{8}|\d{12,14})(?|$) However, this is not foolproof. For example the fourth 'bad example' above will actually match this. That's because the regex isn't sophisticated enough to check whether the URL includes an all-numeric parameter. If you can improve on this, OK! The simplest case, where there is just a domain name followed by /01/{gtin} and nothing else, will be matched by this ^https?://([^\/?#:])*?/01/(\d{8}|\d{12,14})$ But that's more restrictive than we want to be. HTH - shout in my direction if you need more. |
Thanks @philarcher ! Would it make sense to nudge the path part of the URL towards using .well-known ? I appreciate the spec is already mostly baked, though... |
That's a potential help, @danbri yes, but there are problems with that kind of thing when we're shooting for mass adoption. Something for you and I to chew on. I'm v aware of the need to stick within URI design best practice (sovereignty of the server and all that). And use of /.well-known/ is part of that. As ever, we're navigating a path between Web purity and practical reality. |
@philarcher yes - I understand those tradeoffs! But given the existence of /.well-known/ why not at least register a path name for that, so that those who want to use a .well-known URL have something available? Would it be 'gs1'? 'gtin'? 'digital-link'? (getting offtopic but) for those serving pages at these URLs do you say anything about what structured data you hope they'll provide? |
I am in favor of registering a path name and of covering both cases:
|
This issue is being nudged due to inactivity. |
Sorry for coming late, I have a bunch of questions:
I wonder if would not be useful to have a generic mechanism for attaching a digital-data-link to various objects (product, organization) |
We do not currently define /gtin as expecting URLs (URIs).
The most recent direction of the GTIN work at GS1 is all about enabling this, and the /gtin definition at Schema.org in its textual form already anticipates URI/URL values.
The lack of a declaration that the /gtin property has /rangeIncludes /URL is pure oversight.
Talking with @philarcher it may also be useful to include the regex for extracting textual GTINs from full URL/URI.
The text was updated successfully, but these errors were encountered: